Re: [alto] ALTO and Content Delivery Networks
"Y. R. Yang" <yry@cs.yale.edu> Wed, 16 June 2010 05:08 UTC
Return-Path: <yry@cs.yale.edu>
X-Original-To: alto@core3.amsl.com
Delivered-To: alto@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id A224A3A6833 for <alto@core3.amsl.com>; Tue, 15 Jun 2010 22:08:58 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.002
X-Spam-Level:
X-Spam-Status: No, score=0.002 tagged_above=-999 required=5 tests=[BAYES_50=0.001, HTML_MESSAGE=0.001]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id NpVt3R5qPUNt for <alto@core3.amsl.com>; Tue, 15 Jun 2010 22:08:56 -0700 (PDT)
Received: from pantheon-po45.its.yale.edu (pantheon-po45.its.yale.edu [130.132.50.79]) by core3.amsl.com (Postfix) with ESMTP id 05AC53A6912 for <alto@ietf.org>; Tue, 15 Jun 2010 22:08:55 -0700 (PDT)
Received: from [128.36.208.136] (dhcp128036208136.central.yale.edu [128.36.208.136]) (authenticated bits=0) by pantheon-po45.its.yale.edu (8.12.11.20060308/8.12.11) with ESMTP id o5G58wBe022776 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Wed, 16 Jun 2010 01:08:58 -0400
Message-ID: <4C185C7C.9030105@cs.yale.edu>
Date: Wed, 16 Jun 2010 01:09:16 -0400
From: "Y. R. Yang" <yry@cs.yale.edu>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.9) Gecko/20100317 Thunderbird/3.0.4
MIME-Version: 1.0
To: alto@ietf.org
References: <C82E733B.201B3%rpenno@juniper.net>
In-Reply-To: <C82E733B.201B3%rpenno@juniper.net>
Content-Type: multipart/alternative; boundary="------------080106020609020000050704"
X-YaleITSMailFilter: Version 1.2c (attachment(s) not renamed)
Subject: Re: [alto] ALTO and Content Delivery Networks
X-BeenThere: alto@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: "Application-Layer Traffic Optimization \(alto\) WG mailing list" <alto.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/alto>, <mailto:alto-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/alto>
List-Post: <mailto:alto@ietf.org>
List-Help: <mailto:alto-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/alto>, <mailto:alto-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 16 Jun 2010 05:08:58 -0000
Nice draft and efforts on an important problem that is naturally in the scope of ALTO. Here are some comments. Some points include some discussions with Xuan Zhang from Yale tonight. Sorry the comments are a bit long. First, some high level comments: ====================== - I feel that it can be helpful to have a section to enumerate some key differences between a CDN setting and a (swarm) P2P setting. Such differences can drive part of the requirements. As an example difference, we have that P2P has smart, adaptive clients, and thus can have lower requirements on fault tolerance and load balancing. A peer can have a large number of neighbors, and will evolve the topology and load balance among this large set. On the other hand, in a CDN setting, the serving set to a client is much smaller, and the clients are assumed to be dumb, non-adaptive. - I feel that it can be helpful to define the problem settings first before delving into technical branches (HTTP Redirect vs DNS redirect). The basic problem setting is that we have a client Host that needs to select among a set of CDN nodes {CDN_1, CDN_2, ..., CDN_K}. A fundamental challenge, I feel, is that ALTO info can be partial and "colored" (i.e., has a perspective). Thus, it can be helpful to discuss according to the perspective settings, instead of more technical detail settings (HTTP Redirect vs DNS redirect). It can be helpful to solve two fundamental settings: (S1) Host H, {CDN_i}, and the network connecting them belong to a single entity. ALTO info is from this single entity. This is a relatively easy, useful setting. (S2) Host H belongs to ISP, and {CDN_i} belong to CSP. This setting can be much more complex, because there are two perspectives: C^ISP(H<->CDN_i) vs C^CSP(H <-> CDN_i). Or we could introduce a third perspective, for example, C(H <-> CDN_i) coming from a third measurement party. - Since this is CDN, does it may make sense to demonstrate, as a use case, how ALTO info may be integrated into the system of a major CDN system? Let me try an Akmai-like DNS based system, according to my understanding from their patent several years ago. Anyone who knows better public info please correct/update me. Let's call this system A. In system A, the first step is to map src Host H to a serving region represented by a higher level DNS server. To achieve this mapping, we may do the following info: Step 1: Mapping from src address H to SPID, a src PID (e.g., partitioned according to local DNS server) This mapping can be provided by ALTO Map, from an ALTO Server. Step 2: Look up SPID in a cost map Region 1 Region 2 .......... Region K SPID1 SPID2 ... After this look up, we identify the lowest cost/closest region (represented by RID). This mapping can be provided by ALTO Map, from an ALTO Server. Step 3: For a distributed implementation, the system directs to the corresponding DNS server for the identified region. This map can be the format: Region -> lower level DNS server. This mapping may not be provided by ALTO. Note that the three steps can be streamlined into single hash implementation. At a lower level DNS server: Offline computation (maybe with some online triggered load balancing update), using consistent hashing and bin packing, to compute the map: Dest address (including the bucket/customer info as part of the destination name) -> a short list of CDN servers. This map may not come from ALTO. But after the selection, there can be a fine-grained tuning according to source address (otherwise, why deep deployment, instead of clustering/data centers). The fine-tuning can use a local, fine-grained ALTO Map, but need to be careful to not break load balancing. Some detailed comments: ================== - Section 4: it can be helpful to clarify the setting: the wording implies a single ALTO Server, which I assume, by default, is giving the perspective from the CDN network. Section 4.4 touches upon this issue. Moving it a bit forward can be helpful. - Section 4: " ... intercepts an HTTP GET request (1)": add a reference to Figure 1. - First para below Figure 1: I do not understand why you need to disambiguate PIDs containing only hosts from PIDs containing CDN nodes. It may be helpful to elaborate more. - First/second para below Figure 1: Why do you have to enforce only costs from host to CDN? As an example, Akamai streaming uses multiple levels of CDN nodes (Entry points, reflectors, Edge Servers). Knowing info between these inter-CDN nodes can be helpful when computing redirection. - Second para below Figure 1: How to determine the CDN PID from the hostname (domain name) of a URL? Is this sentence trying to address the issue: "Therefore the IP addresses contained in the cost maps may need to be correlated to domain names a priori."? But this is still not fully clear yet. From the big picture, it seems that the process is: (1) map from URL to a list of IP addresses, and (2) look up in the Map for direction. - Second para below Figure 1: For the last sentence, it can be helpful to make it clear that the selection algorithm can be quite flexible and customizable. For example, a standard algorithm I cover in my class (from Akamai patent application) is to use consistent hashing + bin packing. - GAP-1: I have no problem adding PID attributes. But the motivation, in the context of the document, is not fully clear, as it is not made explicit later how it could be used (did I miss it? if so, it can be helpful to add a forward reference) - top of page 7: "a appropriate" -> "an appropriate" - top of page 7: "The issue of default cost if one of importance." if -> is? - I like it that the document presents two approaches in Sections 4.3 and 4.4 respectively. I feel that Section 4.3 is conceptually simpler. For Section 4.4, then there is the issue of converting application info (CDN node load) to ALTO info. You may be forced to fine-grained PIDs in order to distinguish different CDN servers; or you can use some averaging of load of servers at a given location and add to the ALTO costs; but this can be less effective in achieve load balancing. Also note that Section 4.4 will force the ALTO info to be application dependent during conversion. - First para after Figure 3: why the recommendation of partition? - Second/third para of Section 5: mixed use of Proxy and DNS Proxy. - Section 6 and others: it may not be necessary to be limited to selection based on the cost of CDN outgoing traffic; in some settings, the selection can be based on incoming cost, for example, for UGN. - Figure 5: note that a general case can be more complex: at P2Pi meeting, a major issue we were trying to address was that there can be multiple ISPs in between from a subscriber to CDN. I hope that the "flattening-of-the-Internet" makes this less a problem. - GAP-6 and GAP-7: I am not sure there is a need for defining an explicit Border Router Attribute. Richard On 6/4/2010 12:09 PM, Reinaldo Penno wrote: > We posted a new Internet Draft on ALTO and CDNs > > http://www.ietf.org/id/draft-penno-alto-cdn-00.txt > > Regards, > > Reinaldo > > _______________________________________________ > alto mailing list > alto@ietf.org > https://www.ietf.org/mailman/listinfo/alto >
- [alto] ALTO and Content Delivery Networks Reinaldo Penno
- Re: [alto] ALTO and Content Delivery Networks Nicholas Weaver
- Re: [alto] ALTO and Content Delivery Networks Reinaldo Penno
- Re: [alto] ALTO and Content Delivery Networks Y. R. Yang
- Re: [alto] ALTO and Content Delivery Networks Jiang Zhu