Forward proxies and CDN/mirrors
Jack Bates <jzej8k@nottheoilrig.com> Sat, 19 May 2012 07:54 UTC
Return-Path: <ietf-http-wg-request@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 2058121F86AF for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Sat, 19 May 2012 00:54:00 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -9.566
X-Spam-Level:
X-Spam-Status: No, score=-9.566 tagged_above=-999 required=5 tests=[BAYES_40=-0.185, GB_I_LETTER=-2, RCVD_IN_DNSWL_HI=-8, RCVD_IN_SORBS_WEB=0.619]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id aYzUWwF17fHD for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Sat, 19 May 2012 00:53:59 -0700 (PDT)
Received: from frink.w3.org (frink.w3.org [128.30.52.56]) by ietfa.amsl.com (Postfix) with ESMTP id 4DB9121F85C2 for <httpbisa-archive-bis2Juki@lists.ietf.org>; Sat, 19 May 2012 00:53:55 -0700 (PDT)
Received: from lists by frink.w3.org with local (Exim 4.69) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1SVePk-0007cp-LI for ietf-http-wg-dist@listhub.w3.org; Sat, 19 May 2012 07:49:24 +0000
Received: from maggie.w3.org ([128.30.52.39]) by frink.w3.org with esmtp (Exim 4.69) (envelope-from <jzej8k@nottheoilrig.com>) id 1SVePY-0007Rp-QN for ietf-http-wg@listhub.w3.org; Sat, 19 May 2012 07:49:12 +0000
Received: from mail.nottheoilrig.com ([50.16.249.74]) by maggie.w3.org with esmtp (Exim 4.72) (envelope-from <jzej8k@nottheoilrig.com>) id 1SVePV-0007Sk-If for ietf-http-wg@w3.org; Sat, 19 May 2012 07:49:10 +0000
Received: from mail.nottheoilrig.com (localhost [127.0.0.1]) by mail.nottheoilrig.com (Postfix) with ESMTP id C5BC940B91 for <ietf-http-wg@w3.org>; Sat, 19 May 2012 07:48:53 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=nottheoilrig.com; s=mail; t=1337413733; bh=RIS3KtDmLBrJGe3YAxCYa5HzPjZSYNaOHl094LqwhR0=; h=Message-ID:Date:From:MIME-Version:To:CC:Subject:References: Content-Type:Content-Transfer-Encoding; b=m11uMQTvldLhMxr1yPZlbBrpSOSc6HrYqYl30IYQaGOlKZq5toYhB44HstAu4Cstq iMAVnWQDvN+AkDgBto3NgdINjMjzpMp86isEQ1SuTfbnzGKAs0YwkF3hF1RWVch4ww eACsxVi/EM1esJKQ0WXLjYa0+1kbOa1SFX3L8L3c=
Received: from [172.28.0.136] (unknown [41.197.16.250]) by mail.nottheoilrig.com (Postfix) with ESMTPSA; Sat, 19 May 2012 07:48:48 +0000 (UTC)
Message-ID: <4FB75146.1060609@nottheoilrig.com>
Date: Sat, 19 May 2012 00:52:38 -0700
From: Jack Bates <jzej8k@nottheoilrig.com>
User-Agent: Mozilla/5.0 (X11; Linux i686; rv:11.0) Gecko/20120327 Thunderbird/11.0.1
MIME-Version: 1.0
To: ietf-http-wg@w3.org
CC: Anthony Bryan <anthonybryan@gmail.com>, Leif Hedstrom <zwoop@apache.org>
References: %3CCANqTPeivxKNJD0pzyGWWeer-4fxKpKU_zAp+7WrheizukaEEGg@mail.gmail.com%3E
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
Received-SPF: pass client-ip=50.16.249.74; envelope-from=jzej8k@nottheoilrig.com; helo=mail.nottheoilrig.com
X-W3C-Hub-Spam-Status: No, score=-1.2
X-W3C-Hub-Spam-Report: BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_SORBS_WEB=0.77, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01
X-W3C-Scan-Sig: maggie.w3.org 1SVePV-0007Sk-If 4118788f66b38c64bfe7d32d4f74b87d
X-Original-To: ietf-http-wg@w3.org
Subject: Forward proxies and CDN/mirrors
Archived-At: <http://www.w3.org/mid/4FB75146.1060609@nottheoilrig.com>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/13546
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <http://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>
Resent-Message-Id: <E1SVePk-0007cp-LI@frink.w3.org>
Resent-Date: Sat, 19 May 2012 07:49:24 +0000
Hello, I am curious to know the current thinking on HTTP forward proxies and content distribution networks, or download mirrors. What techniques are used to help forward proxies and content distribution networks play well together? What facilities are available in the HTTP protocol for this? What resources are available from the broader community of standards and best practices? The approach that I am currently pursuing is to use RFC 6249, Metalink/HTTP: Mirrors and Hashes. For those content distribution networks that support it, our forward proxy listens for responses that are an HTTP redirect and have "Link: <...>; rel=duplicate" headers. If the URL in the "Location: ..." header is not already cached then we scan "Link: <...>; rel=duplicate" headers for a URL that is already cached and if found, we rewrite the "Location: ..." header with this URL I would be very grateful for any feedback on this approach. What are the problems with this strategy? What are the alternatives? How does it relate to the letter or spirit of web architecture? We are also thinking of using RFC 3230, Instance Digests in HTTP. Our proxy would listen for HTTP redirect responses that had "Digest: ..." headers. If the URL in the "Location: ..." header were not already cached then we would check if other content with the same digest were already cached. If so then we would rewrite the "Location: ..." header with the corresponding URL The issue of forward proxies and content distribution networks is important to us because we run a caching proxy here at a rural village in Rwanda. Many web sites that distribute files present users with a simple download button that redirects to a download mirror, but they do not predictably redirect to the same mirror, or to a mirror that we already cached, so users can't predict whether a download will take seconds or hours, which is frustrating Here is a proof of concept plugin [1] for the Apache Traffic Server open source caching proxy. It works just enough that given a response with a "Location: ..." header that is not already cached and a "Link: <...>; rel=duplicate" header that is already cached, it will replace the URL in the "Location: ..." header with the cached URL I am working on this as part of the Google Summer of Code [1] https://github.com/jablko/dedup
- Forward proxies and CDN/mirrors Jack Bates
- Re: Forward proxies and CDN/mirrors Mark Nottingham
- Re: Forward proxies and CDN/mirrors Jack Bates