Re: [alto] ALTO and Content Delivery Networks

"Y. R. Yang" <yry@cs.yale.edu> Wed, 16 June 2010 05:08 UTC

Return-Path: <yry@cs.yale.edu>
X-Original-To: alto@core3.amsl.com
Delivered-To: alto@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id A224A3A6833 for <alto@core3.amsl.com>; Tue, 15 Jun 2010 22:08:58 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.002
X-Spam-Level:
X-Spam-Status: No, score=0.002 tagged_above=-999 required=5 tests=[BAYES_50=0.001, HTML_MESSAGE=0.001]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id NpVt3R5qPUNt for <alto@core3.amsl.com>; Tue, 15 Jun 2010 22:08:56 -0700 (PDT)
Received: from pantheon-po45.its.yale.edu (pantheon-po45.its.yale.edu [130.132.50.79]) by core3.amsl.com (Postfix) with ESMTP id 05AC53A6912 for <alto@ietf.org>; Tue, 15 Jun 2010 22:08:55 -0700 (PDT)
Received: from [128.36.208.136] (dhcp128036208136.central.yale.edu [128.36.208.136]) (authenticated bits=0) by pantheon-po45.its.yale.edu (8.12.11.20060308/8.12.11) with ESMTP id o5G58wBe022776 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Wed, 16 Jun 2010 01:08:58 -0400
Message-ID: <4C185C7C.9030105@cs.yale.edu>
Date: Wed, 16 Jun 2010 01:09:16 -0400
From: "Y. R. Yang" <yry@cs.yale.edu>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.9) Gecko/20100317 Thunderbird/3.0.4
MIME-Version: 1.0
To: alto@ietf.org
References: <C82E733B.201B3%rpenno@juniper.net>
In-Reply-To: <C82E733B.201B3%rpenno@juniper.net>
Content-Type: multipart/alternative; boundary="------------080106020609020000050704"
X-YaleITSMailFilter: Version 1.2c (attachment(s) not renamed)
Subject: Re: [alto] ALTO and Content Delivery Networks
X-BeenThere: alto@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: "Application-Layer Traffic Optimization \(alto\) WG mailing list" <alto.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/alto>, <mailto:alto-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/alto>
List-Post: <mailto:alto@ietf.org>
List-Help: <mailto:alto-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/alto>, <mailto:alto-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 16 Jun 2010 05:08:58 -0000

Nice draft and efforts on an important problem that is naturally in the 
scope of ALTO.

Here are some comments. Some points include some discussions with Xuan 
Zhang from Yale tonight. Sorry the comments are a bit long.

First, some high level comments:
======================
- I feel that it can be helpful to have a section to enumerate some key 
differences between a CDN setting and a (swarm) P2P setting. Such 
differences can drive part of the requirements.

As an example difference, we have that  P2P has smart, adaptive clients, 
and thus can have lower requirements on fault tolerance and load 
balancing. A peer can have a large number of neighbors, and will evolve 
the topology and load balance among this large set. On the other hand, 
in a CDN setting, the serving set to a client is much smaller, and the 
clients are assumed to be dumb, non-adaptive.

- I feel that it can be helpful to define the problem settings first 
before delving into technical branches (HTTP Redirect vs DNS redirect).  
The basic problem setting is that we have a client Host that needs to 
select among a set of CDN nodes {CDN_1, CDN_2, ..., CDN_K}. A 
fundamental challenge, I feel, is that ALTO info can be partial and 
"colored" (i.e., has a perspective). Thus, it can be helpful to discuss 
according to the perspective settings, instead of  more technical detail 
settings (HTTP Redirect vs DNS redirect). It can be helpful to solve two 
fundamental settings:

   (S1) Host H, {CDN_i}, and the network connecting them belong to a 
single entity. ALTO info is from this single entity. This is a 
relatively easy, useful setting.

   (S2) Host H belongs to ISP, and {CDN_i} belong to CSP. This setting 
can be much more complex, because there are two perspectives: 
C^ISP(H<->CDN_i) vs C^CSP(H <-> CDN_i). Or we could introduce a third 
perspective, for example, C(H <-> CDN_i) coming from a third measurement 
party.

- Since this is CDN, does it may make sense to demonstrate, as a use 
case, how ALTO info may be integrated into the system of a major CDN 
system? Let me try an Akmai-like DNS based system, according to my 
understanding from their patent several years ago. Anyone who knows 
better public info please correct/update me. Let's call this system A. 
In system A, the first step is to map src Host H to a serving region 
represented by a higher level DNS server. To achieve this mapping, we 
may do the following info:

Step 1: Mapping from src address H to SPID, a src PID (e.g., partitioned 
according to local DNS server)
  This mapping can be provided by ALTO Map, from an ALTO Server.

Step 2: Look up SPID in a cost map

            Region 1  Region 2 .......... Region K
SPID1
SPID2
...

After this look up, we identify the lowest cost/closest region 
(represented by RID).

   This mapping can be provided by ALTO Map, from an ALTO Server.

Step 3: For a distributed implementation, the system directs to the 
corresponding DNS server for the identified region. This map can be the 
format:
Region -> lower level DNS server.

This mapping may not be provided by ALTO.

Note that the three steps can be streamlined into single hash 
implementation.

At a lower level DNS server:

Offline computation (maybe with some online triggered load balancing 
update), using consistent hashing and bin packing, to compute the map:

Dest address (including the bucket/customer info as part of the 
destination name) -> a short list of CDN servers.

This map may not come from ALTO. But after the selection, there can be a 
fine-grained tuning according to source address (otherwise, why deep 
deployment, instead of clustering/data centers). The fine-tuning can use 
a local, fine-grained ALTO Map, but need to be careful to not break load 
balancing.

Some detailed comments:
==================
- Section 4: it can be helpful to clarify the setting: the wording 
implies a single ALTO Server, which I assume, by default, is giving the 
perspective from the CDN network. Section 4.4 touches upon this issue. 
Moving it a bit forward can be helpful.

- Section 4: " ... intercepts an HTTP GET request (1)": add a reference 
to Figure 1.

- First para below Figure 1: I do not understand why you need to 
disambiguate PIDs containing only hosts from PIDs containing CDN nodes. 
It may be helpful to elaborate more.

- First/second para below Figure 1: Why do you have to enforce only 
costs from host to CDN? As an example, Akamai streaming uses multiple 
levels of CDN nodes (Entry points, reflectors, Edge Servers). Knowing 
info between these inter-CDN nodes can be helpful when computing 
redirection.

- Second para below Figure 1: How to determine the CDN PID from the 
hostname (domain name) of a URL?
   Is this sentence trying to address the issue: "Therefore the IP 
addresses contained in the cost maps may need to be correlated to domain 
names a priori."? But this is still not fully clear yet. From the big 
picture, it seems that the process is: (1) map from URL to a list of IP 
addresses, and (2) look up in the Map for direction.

- Second para below Figure 1: For the last sentence, it can be helpful 
to make it clear that the selection algorithm can be quite flexible and 
customizable. For example, a standard algorithm I cover in my class 
(from Akamai patent application) is to use consistent hashing + bin packing.

- GAP-1: I have no problem adding PID attributes. But the motivation, in 
the context of the document, is not fully clear, as it is not made 
explicit later how it could be used (did I miss it? if so, it can be 
helpful to add a forward reference)

- top of page 7: "a appropriate" -> "an appropriate"

- top of page 7: "The issue of default cost if one of importance." if -> is?

- I like it that the document presents two approaches in Sections 4.3 
and 4.4 respectively. I feel that Section 4.3 is conceptually simpler. 
For Section 4.4, then there is the issue of converting application info 
(CDN node load) to ALTO info. You may be forced to fine-grained PIDs in 
order to distinguish different CDN servers; or you can use some 
averaging of load of servers at a given location and add to the ALTO 
costs; but this can be less effective in achieve load balancing. Also 
note that Section 4.4 will force the ALTO info to be application 
dependent during conversion.

- First para after Figure 3: why the recommendation of partition?

- Second/third para of Section 5: mixed use of Proxy and DNS Proxy.

- Section 6 and others: it may not be necessary to be limited to 
selection based on the cost of CDN outgoing traffic; in some settings, 
the selection can be based on incoming cost, for example, for UGN.

- Figure 5: note that a general case can be more complex: at P2Pi 
meeting, a major issue we were trying to address was that there can be 
multiple ISPs in between from a subscriber to CDN. I hope that the 
"flattening-of-the-Internet" makes this less a problem.

- GAP-6 and GAP-7: I am not sure there is a need for defining an 
explicit Border Router Attribute.

Richard
On 6/4/2010 12:09 PM, Reinaldo Penno wrote:
> We posted a new Internet Draft on ALTO and CDNs
>
> http://www.ietf.org/id/draft-penno-alto-cdn-00.txt
>
> Regards,
>
> Reinaldo
>
> _______________________________________________
> alto mailing list
> alto@ietf.org
> https://www.ietf.org/mailman/listinfo/alto
>