Re: [DNSOP] CLIENT-SUBNET bis appetite?

Lanlan Pan <abbypan@gmail.com> Sat, 16 December 2017 11:12 UTC

Return-Path: <abbypan@gmail.com>
X-Original-To: dnsop@ietfa.amsl.com
Delivered-To: dnsop@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 12E04124B18 for <dnsop@ietfa.amsl.com>; Sat, 16 Dec 2017 03:12:13 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.997
X-Spam-Level:
X-Spam-Status: No, score=-1.997 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, NORMAL_HTTP_TO_IP=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id H9R5z3LXus2h for <dnsop@ietfa.amsl.com>; Sat, 16 Dec 2017 03:12:09 -0800 (PST)
Received: from mail-wm0-x243.google.com (mail-wm0-x243.google.com [IPv6:2a00:1450:400c:c09::243]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 0374D1200FC for <dnsop@ietf.org>; Sat, 16 Dec 2017 03:12:09 -0800 (PST)
Received: by mail-wm0-x243.google.com with SMTP id f206so21677304wmf.5 for <dnsop@ietf.org>; Sat, 16 Dec 2017 03:12:08 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=RbmkJC9CMKKjlkp8jfvVeqO4viDCdmq7ohfk/jAgsF0=; b=tPnCnN/MNEeDlabBFyk2EnY3UaIXAC0kB5tHZYJakjmsnyXFwwh/sj75pgUHMtmuFu 4eF5gVx1BkoPikIaTRGBPYgR9jJoAW7iQNUnRwFBN2aAwr3eu/EPCRfKjfHp+fv65A/C Np9ZvedSzq4RMNsT2l3fgAvsHbLkBEl9vzWiSLhKFZjhebBVrL0PmaC6gskB/oT3Pj1T l8SDJY7apv7TNCldGl8mRCTWDHFai1jiPn5BdUbbAR6VgxaApMi44D33xtOoaoVzKXbZ HnY+DACY/dUTcf810fYznOlzfysn0cFOAVt5OViphaP3E7H1CZZ0RNz1sw/YkvNEp2nw Iz1Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=RbmkJC9CMKKjlkp8jfvVeqO4viDCdmq7ohfk/jAgsF0=; b=PtuQy9AyxbptFlumUu4f69CKogKhX/boKmwiXkgDbE9OvzuPRnFo2xNkDfn92FK9Dj rIQ5q/Rump+mIvtYOjY0rDAt8LPaIcVagfz76IxGXshTCet9CRkuwUy+XoBtVqOZlzsZ QuBmP7Ibm3ApGFmLP9JOJAc0s54RFzOmH0hJmO/1SK90tbPcOw3t8ELk8LNwFRg2HP3X N7nLeqzTYaRIbgz6H/EIkzPJC108ABF5ey++OgOT1OQVsF6AFUUolRButnMp6G9L8yXx pr7VgW18TLqqRZZiDJJN8dm3G7fhMpS0Je1T0a5TPvnn8+aq7CIYe+boUDlmvu9n6kMb oo8A==
X-Gm-Message-State: AKGB3mKxLkIQLaThMF1lMyv+9xK3gWeK4tjCGwF1CysP3NOcbx/aOoKZ 2gL7baCwrpVMsuCXt2d1jiv2r38VXAXru8U/Fk4=
X-Google-Smtp-Source: ACJfBouEYC1rSbDEjbTkGzzwEsYMCDkR2rby1yK6gjaZ0BuVQ0StxjcKmvjo+q5hKJcLDnNA+nrKjusRkuQpz63SPrY=
X-Received: by 10.80.148.124 with SMTP id q57mr21611636eda.300.1513422727460; Sat, 16 Dec 2017 03:12:07 -0800 (PST)
MIME-Version: 1.0
References: <20171214173913.GA18100@jurassic.lan.banu.com> <20171214180058.GC4176@server.ds9a.nl> <20171215165008.GA11449@jurassic.lan.banu.com> <CAHw9_iKb1Chgv6DW+oFJt-mdtTWuJ2K=RdKck93CYdsMKup7kQ@mail.gmail.com>
In-Reply-To: <CAHw9_iKb1Chgv6DW+oFJt-mdtTWuJ2K=RdKck93CYdsMKup7kQ@mail.gmail.com>
From: Lanlan Pan <abbypan@gmail.com>
Date: Sat, 16 Dec 2017 11:11:56 +0000
Message-ID: <CANLjSvXykP=ZZDo0MEBthRwb7H+QLbjL7OdgoEPpefjjcRAO_w@mail.gmail.com>
To: Warren Kumari <warren@kumari.net>
Cc: Mukund Sivaraman <muks@isc.org>, dnsop <dnsop@ietf.org>, bert hubert <bert.hubert@powerdns.com>
Content-Type: multipart/alternative; boundary="f403045c202410bf6605607330da"
Archived-At: <https://mailarchive.ietf.org/arch/msg/dnsop/yvEZ4Tp1wneo_6Dm8diqP-t8KUI>
Subject: Re: [DNSOP] CLIENT-SUBNET bis appetite?
X-BeenThere: dnsop@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: IETF DNSOP WG mailing list <dnsop.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dnsop>, <mailto:dnsop-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dnsop/>
List-Post: <mailto:dnsop@ietf.org>
List-Help: <mailto:dnsop-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dnsop>, <mailto:dnsop-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 16 Dec 2017 11:12:13 -0000

Hi Warren,

Thank you for the mention, :-)

We all know that, because of network topology, client subnet is the *best*
indicator for CDN traffic management, expecially for Akamai, Google, ...
Totally agree with you:  There is no *best* answer for a country, nor a
city or even a postal address.

My thought is :
For GeoIP-enabled Authoritative Servers, they offen map Client Subnet into
EIL <COUNTRY, AREA, ISP>, then return tailored response only based on
<COUNTRY, AREA, ISP>.
There is *sufficient* answer for <COUNTRY, AREA, ISP>, on a GeoIP-enabled
Authoritative Server‘s *own view*.
EIL can be a trade-off choice for GeoIP-enabled Authoritative Server, and
optimize Recursive Resolver's cache cost.

Latest EIL document is at:
https://github.com/abbypan/dns_test_eil/blob/master/ietf_draft/draft.txt
There are many compatibility, cost vs benefit, operational concerns on EIL
from WG experts now, I am *still revising* it, welcome discussions and
issues.

Example:
xxx.com's Authoritative Server deploy GeoDNS proactively.
=> xxx.com's Authoritative Server itself choose accept *<COUNTRY, AREA,
ISP>* level traffic management.
=> for some operation problem/privacy concerns (maybe like Mukund mentioned
above),  we can use EIL to mitigrate.

ECS: I am network topology close to <111.201.133.0/24>, which is the best
answer on network topology now ?
GeoIP-enabled Authoritative Servers map <111.201.133.0/24> into <China,
Beijing, Unicom ISP>, then find the tailored answer.
Recursive Resolver cache the response with <111.201.133.0/24>, or some
shorter prefix.

EIL: I am network topology close to <China, Beijing, Unicom ISP>, which is
the best answer on network topology now ?
Recursive Resolver cache the response with <China, Beijing, Unicom ISP>,
which can cover many *network topology closed* client subnets.

Warren Kumari <warren@kumari.net>于2017年12月16日周六 上午3:34写道:

> On Fri, Dec 15, 2017 at 11:50 AM, Mukund Sivaraman <muks@isc.org> wrote:
> > On Thu, Dec 14, 2017 at 07:00:58PM +0100, bert hubert wrote:
> >> On Thu, Dec 14, 2017 at 11:09:13PM +0530, Mukund Sivaraman wrote:
> >> > Any appetite for it? Don't throw things at me.. I ask because the
> >> > current thing is slowly getting more widely deployed and there are
> >> > design issues that can do with a ECS2 that breaks from ECS1 protocol.
> I
> >> > ask because I'm once again having to deal with myriad implementation
> >> > cases and dislike it.
> >>
> >> Could you elaborate what you dislike most?
> >
> > It is too complicated to implement ECS correctly. There are a large
> > number of corner cases. The things that resolvers and authoritative
> > sides have to take care of are quite different. It is more complex
> > than anything else in DNS.
> >
> > I think this should be built again from scratch.
> >
> >> The biggest thing we are noticing is that while it does great things
> >> to getting to a server the content provider likes, it unavoidably
> >> drives doen cache hitrates a lot, introducing a latency penalty.
> >>
> >> The operators we see deploying ECS have tens of thousands of subnets
> >> which all need to be mapped to only a few servers. But you still end up
> >> with tens of thousands of cache entries and therefore tiny cache
> hitrates.
> >>
> >> Such things could be addressed by answering with lists of subnet masks
> to
> >> which this answer would also apply, but this makes little sense
> >> operationally I think.
> >
> > Firstly, correct deaggregation is an important requirement of reducing
> > cache usage. With the current design of ECS protocol, it's very
> > important that correct disjoining of prefixes be done optimally to avoid
> > cache pollution, yet the draft does not specify a suitable algorithm for
> > it (we know how to do it, I think the draft should have stated it).
> >
> > A /n address prefix as specified by ECS option is a perfect binary
> > tree of 1<<n addresses. To correctly deaggregate 0.0.0.0/0 (scope=0)
> > data from a longer prefix such as 10.0.0.0/24, this will result in all
> > these answers to be generated:
> >
> > * 10.0.0.0/24 answer
> > * 10.0.0.0/23 exact match answer (scope > source)
> > * 10.0.1.0/23 answer
> > * 10.0.0.0/22 exact match answer (scope > source)
> > * 10.0.3.0/22 answer
> >
> > and so on.. there are about 2n+1 answers necessary so that a 0.0.0.0/0
> > answer does not override a /n client from receiving its specific answer.
> >
> >                   x
> >            x            x
> >          x   x        x   x
> >         x x x x      x x y y
> >
> > If ECS option had more fields, we could have put the above pattern as
> > a difference of trees (with direction bit and height, e.g., "x"s in
> > the diagram above) and it would have reduced cache usage
> > considerably.
> >
> > This can be generalized with more differences but anyway I think that
> > using QUERY for ECS is a badly done idea (not even mentioning privacy
> > loss).
> >
> > As an example, RPZ does not rely on queries.. it transfers all prefixes
> > for matching in a zone so that the longest prefix match algorithm will
> > not suffer from a previously cached shorter prefix matching and
> > preventing future fetches.
> >
> > Another related problem is this: We often want to match against a GeoIP
> > database (containing what may be changing but maintained prefixes) in
> > associating zone data with geographic/network-topo clients. We want to
> > say "serve this answer for country X or city Y or ASNNNN" and we don't
> > care about managing the actual prefixes.
> >
> > GeoIP is a custom database format, where I can match against GeoIP, but
> > I can't easily deaggregate all its prefixes for an ECS zone to be cached
> > properly.
> >
> > I feel it would have been better to say to a downstream resolver "This
> > answer is for country X", or "A is the answer for country X, B is the
> > answer for country Y, the rest use answer C".
>
> <No hats. other than working at a place that uses ECS to provide
> optimized answers>
> Please see the thread
> https://www.ietf.org/mail-archive/web/dnsop/current/msg19614.html
>
> There is no *best* answer for a country, nor a city or even a postal
> address.
>
> There is instead a best answer for a specific IP address / subnet,
> which depends upon the ISP, the peering connections with that ISP, the
> utilization of the peering links with that ISP, the utilization of the
> datacenter, and then much lower down, where in the physical topology
> that subnet resides.
>
> I'm within spitting distance of Ashburn Equinix (well, I cannot quite
> spit there, but I could easily walk there, it's < 5 miles), but my ISP
> is Comcast. This
> <https://maps.google.com/?q=s+Comcast.+This&entry=gmail&source=g> means
> that, instead of hitting anything in IAD
> (Ashburn), I (currently) instead get sent to MRN (Lenoir, North
> Carolina), which is 400 miles away. My packets also happen to go
> through 111 8th Ave, NY - even though there is stuff in NY, the MRN
> location is better / faster.
>
> If I happened to have chosen a different ISP, Verizon for instance, my
> optimal answers would be very different.
> And no, the "obvious" answer of "just use the AS number then" fails
> equally badly - my latency to 8.8.8.8 (to chose something at random)
> is ~12ms. If I ware a Comcast customer in California and the same
> location were handed out, the latency would be ~80ms.
>
> Note that I'm in a place which is well connected - in many places, the
> "less optimal" answer is much much less optimal...
>
>
> >
> > The design of ECS needs to be reconsidered. I'd prefer something like a
> > zone format for it, than using QUERY. QUERY cannot give complete
> > information about all prefixes and there is a possibility of incorrect
> > caching, and a very high probability of redundant cache pollution.
>
> Sure, happy to reconsider the design, but it is important to know more
> about what the constraints are, and how specific and dynamic the
> answers are.
>
> For exmaple, Akamai says they are in 1,600 networks
> (https://www.akamai.com/uk/en/about/facts-figures.jsp)  "The
> Cloudflare Global Anycast Network"
> (https://www.cloudflare.com/network/) is is powered by 118 data
> centers around the world. "
>
> Google Cloud Platform has added new regions in São Paulo and Mumbai.
> GCP has 13 regions, 39 zones, over 100 points of presence, and a
> well-provisioned global network with 100,000s of miles of fiber optic
> cable." (embarrassingly I couldn't easily find a public page with more
> detail)
>
> Dyn has a map here: https://dyn.com/dns/network-map/
>
>
>
> >
> > >From a resolver's point of view, a non-ECS answer (no client-subnet
> > option) is different from an ECS answer with scope=0 which is different
> > from an ECS answer with source=0, whereas all these may be the same from
> > an authority's point of view. They all need to be cached differently
> > (from an intermediate resolver's view).
> >
> > This thing of scope > source meaning for-exact-match-only is weird as
> > hell when implementing longest prefix matching. It is not convenient
> > to use an off-the-shelf radix tree.
> >
> > ECS relies on the option always being returned for any kind of
> > answers, as some resolvers use that as an indicator of ECS support
> > (and stop using ECS if it ever stops). But ECS does not apply to
> > several kinds of answers (e.g., anything but NOERROR, esp. NXDOMAIN
> > and NODATA have to be consistent across all prefixes.) It doesn't
> > apply to SOA, DNSKEY, NS in answer section, referrals, etc. Yet,
> > many of these need to answer with SCOPE=0.
> >
> > An ACL config option about whether the NS supports ECS or not (to
> > return the option or not) is different from a config option whether
> > the NS passes through ECS or not: the latter would always pass through
> > SOURCE=0 but return REFUSED for any ECS queries that didn't match the
> > ACL; where as the former would return non-ECS reply for any ECS
> > queries that didn't match the ACL).
> >
> > Transitivity of the option has corner cases.
> >
> > I don't have to point out how easy it is for a erroneous /16 to
> > prevent queries to /24 answers shadowed by the /16.
> >
> > Some cache cases: Obviously an ECS cache is different from a
> > zone.. it's not from a single zone, it is not an atomic collection of
> > a single version of zone and ever changing. If there's a /24 answer in
> > cache, and a newer query brings in a /16 answer that shadows it,
> > should the resolver assume that the /16 has precedence because it's
> > newer (hence the /24 should no longer exist) or do a
> > longest-prefix-match against the older /24? What if the /16 then
> > expires and the /24 hasn't expired? An NXDOMAIN answer should expire
> > any previously cached prefix-specific cache entries for that name. A
> > NODATA answer should expire any previously cached prefix-specific
> > cache entries for that type.  Non-ECS data is different from SCOPE=0
> > data. There are questions about trust ranking with usage of ECS data.
> >
> > These are just some topics that I can quickly think of. There are many
> > other issues we faced and discussed during resolver ECS development.
> >
> > The draft leaves many things unspecified, such as more clarity in DNSSEC
> > and handling of negative answers. Many issues were fixed during the
> > draft phase, but I feel it was insufficient.
> >
> >> Can you share your ideas for ECS2?
> >
> > There are many quirks in ECS. I don't want to propose specific ideas
> > now, except that we should gather requirements and start from
> > scratch.
>
> Yes, much of my soapbox rant was about just this -- understanding the
> requirements is important - the reason that CDNs provide different
> answers based upon the IP address it is a proxy for latency /
> performance.
> I'm sure we got many things wrong in ECS, but a redesign needs to be
> informed by the use case and requirements.
>
> (This mail not meant to sound as grumpy as it turned out :-) )
>
> W
>
> > We have to reduce complexity of the protocol on both auth and
> > caching resolver sides. I think it should be designed again from
> > requirements without being a tweak of ECS1. The current protocol
> > complicates DNS implementation significantly.
> >
> >                 Mukund
> >
> > _______________________________________________
> > DNSOP mailing list
> > DNSOP@ietf.org
> > https://www.ietf.org/mailman/listinfo/dnsop
>
>
>
> --
> I don't think the execution is relevant when it was obviously a bad
> idea in the first place.
> This is like putting rabid weasels in your pants, and later expressing
> regret at having chosen those particular rabid weasels and that pair
> of pants.
>    ---maf
>
> _______________________________________________
> DNSOP mailing list
> DNSOP@ietf.org
> https://www.ietf.org/mailman/listinfo/dnsop
>
-- 
致礼  Best Regards

潘蓝兰  Pan Lanlan