Re: [alto] Warren Kumari's Discuss on draft-ietf-alto-xdom-disc-04: (with DISCUSS)

Sebastian Kiesel <> Tue, 18 December 2018 23:23 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 67B87131219; Tue, 18 Dec 2018 15:23:10 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.898
X-Spam-Status: No, score=-1.898 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id H8XIH9UapzR9; Tue, 18 Dec 2018 15:23:06 -0800 (PST)
Received: from ( [IPv6:2a02:a00:e000:116::41]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 6211513124D; Tue, 18 Dec 2018 15:23:06 -0800 (PST)
Received: from sebi by with local (Exim 4.89) (envelope-from <>) id 1gZOhY-0003s3-GA; Wed, 19 Dec 2018 00:23:00 +0100
Date: Wed, 19 Dec 2018 00:23:00 +0100
From: Sebastian Kiesel <>
To: Warren Kumari <>
Cc: The IESG <>,, Jan Seedorf <>,,
Message-ID: <>
References: <>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <>
Accept-Languages: en, de
Organization: my personal mail account
User-Agent: NeoMutt/20170113 (1.7.2)
Archived-At: <>
Subject: Re: [alto] Warren Kumari's Discuss on draft-ietf-alto-xdom-disc-04: (with DISCUSS)
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Application-Layer Traffic Optimization \(alto\) WG mailing list" <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Tue, 18 Dec 2018 23:23:11 -0000


apologies if I restate something obvious, but the key issue for this
discussion is probably that the XDOM procedure is NOT supposed to be run
on your laptop, but instead on a centralized server such as a Content
Delivery Networks's (CDN) HTTP redirect server, a P2P tracker, etc.,
i.e., a "resource directory" in the terminology defined in RFC 5693.

Let me give an example:

1. You (or some software on your laptop) try to access
--> DNS lookups for A records, TCP connection, HTTP GET ...

2. The server behind calls
XDOM( $ENV{REMOTE_ADDR}, "ALTO:https" ) , i.e. with your laptop's
IP address, or the "public" IP address of the outermost NAT in front
of your laptop, as a parameter.

3. If the XDOM procedure succeeds, it will return the URI of an
ALTO server (typically the one provided by your ISP).

4. The software on the HTTP server will contact said ALTO server to get
more information about topology, routing costs, etc. from the point of
view of your ISP.  Based on these ALTO informations, it will choose one
of serveral known servers that can provide really-big-service-pack.bin
to your laptop; it chooses the one that gives highest throughput and/or
causes least costs for traffic.  It then returns an appropriate HTTP
redirect to your laptop.

5. Your laptop will get the large file from there.

Of course, all this overhead only makes sense for larger data transfers,
probably not for regular "web surfing". That alone might limit the
number of XDOM invocations.

The DNS queries caused by the XDOM procedure will hit:

- The recursive name server next to the HTTP redirect server, 
  or the tracker, etc. --> reasonable sizing of this name server
  is the duty of the CDN operator, tracker operater etc.

- The authoritative name servers of the ISPs. If XDOM starts to take off
  and is used by many CDNs or trackers, the resulting load on these name
  servers might become an incentive for the ISPs to put the NAPTR RRs in
  place so that the first or second query succeeds.

- If ISPs won't install the NAPTR RRs, XDOM escalates and the queries
  will hit the authoritative servers of the RIRs etc.  But, their
  answers (both NAPTR RRs or NXDOMAIN) can be cached in the recursive
  name servers next to the redirect servers or trackers.  As said, we
  assume the XDOM procedure will not run on billions of PCs, laptops and
  smartphones, but only on some (ten)thousands of CDN servers, P2P
  trackers, etc. in the Internet. Each of the adjacent recursive name
  servers will probably learn quicky all the answers on a /16 or /8
  level, and a cached result can be reused when the XDOM procedure is
  called for a different IP address from the same /8 or /16.  Therefore,
  the load on RIR's servers should be manageable as well.

Does this sound reasonable?



On Tue, Dec 18, 2018 at 12:37:45PM -0800, Warren Kumari wrote:
> Warren Kumari has entered the following ballot position for
> draft-ietf-alto-xdom-disc-04: Discuss
> When responding, please keep the subject line intact and reply to all
> email addresses included in the To and CC lines. (Feel free to cut this
> introductory paragraph, however.)
> Please refer to
> for more information about IESG DISCUSS and COMMENT positions.
> The document, along with other ballot positions, can be found here:
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> Note: I have not completed my review in detail (and so it may be answered
> further down), but I wanted to get this in early...
> I'm in no way an ALTO expert (I can barely spell it), so am hoping that I'm
> missing something obvious, but I'm really concerned by the scaling implications
> / cost shifting of this.
> Let's say this suddenly becomes very popular -- Apple includes this in the iOS
> App Store / iMessage app, or Chrome / Firefox decides to start doing this to
> find the best datacenter to send traffic to or something...
> Until the huge majority of ISPs start answering with these records for all of
> their subnets, it seems like there could be a sizable amount of traffic hitting
> a: the ISPs recursive servers, b: RIRs, and possibly c: AS112 servers.
> E.g: The address I get when I lookup is
> These are the lookups I'd need to do (I think!) if my $application (or, more
> worrying, framework / browser) were to use this:
> wkumari$ dig +nocomment +nostats +nocmd NAPTR
> ;   IN      NAPTR
> 59     IN      SOA
> 226022060 900 900 1800 60
> wkumari$ dig +nocomment +nostats +nocmd NAPTR
> ;       IN      NAPTR
> 59     IN      SOA
> 225983176 900 900 1800 60
> wkumari$ dig +nocomment +nostats +nocmd NAPTR
> ;           IN      NAPTR
>       1539    IN      SOA
> 2017026288 1800 900 691200 10800
> wkumari$ dig +nocomment +nostats +nocmd NAPTR
> ;              IN      NAPTR
>       1665    IN      SOA
> 2017026288 1800 900 691200 10800
> This is 4 lookups per host / app / connection hitting my recursive servers. In
> addition 2 of them hit Google's resolvers, and 2 hit ARINs. Yes, ARIN already
> gets many "reverse" queries, and my recursive already does lots of lookups, but
> the document doesn't (that I could see) discuss the potential fallout from
> potentially *lots* more load. Caching is only slightly effective here -- there
> are many many subnets, and e.g the ARIN NoData,NoError response will be cached
> for 1800 seconds (30 minutes).
> There are other examples -- for example, my laptop is currently on
> If I try connect to using an app which implements
> this, I'll have 4 queries hitting my recursive server (3 of which will get
> NXDOMAIN) and hitting ARINs servers.
> I'm assuming that I must be missing something obvious here, because I cannot
> see how the above sounds reasonable.