All of the WWW Available **Forever**

Edward Cherlin <> Tue, 20 May 1997 00:05 UTC

Received: from cnri by id aa10716; 19 May 97 20:05 EDT
Received: from services.Bunyip.Com by CNRI.Reston.VA.US id aa13158; 19 May 97 20:05 EDT
Received: (from daemon@localhost) by (8.8.5/8.8.5) id TAA13358 for uri-out; Mon, 19 May 1997 19:48:21 -0400 (EDT)
Received: from (mocha.Bunyip.Com []) by (8.8.5/8.8.5) with ESMTP id TAA13343 for <>; Mon, 19 May 1997 19:48:19 -0400 (EDT)
Received: from ( []) by (8.8.5/8.8.5) with ESMTP id TAA22553 for <uri@Bunyip.Com>; Mon, 19 May 1997 19:48:03 -0400 (EDT)
Received: from [] ( []) by (8.8.5/8.6.5) with ESMTP id QAA09543 for <uri@Bunyip.Com>; Mon, 19 May 1997 16:47:46 -0700 (PDT)
Message-Id: <v03007806afa63bb6549b@[]>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Date: Mon, 19 May 1997 10:20:43 -0700
From: Edward Cherlin <>
Subject: All of the WWW Available **Forever**
Precedence: bulk

This suggests a new URL scheme: traditional URL plus date, directed to this
archive. Something similar for Usenet, also, directed to Deja News.

>Subject:  All of the WWW Available **Forever**
>From: ____Textpert Alert____ <>
>Mime-Version:  1.0
>Precedence: list
>Date:  Mon, 19 May 1997 14:41:14 +0200
>  True to my name handle, I'd like to alert y'all to the truly
>  Xanadudlian mission of the start-up Internet Archive and Alexa
>  companies, the former a non-profit effort to continuously
>      s t o r e  ALL OF (unrestricted-access) WWW pages FOREVER ;
>  the second a commercial outfit developing tools to browse and
>  reuse such cumulative/ multi-generation archive contents.
>  Acc. to their owner Brewster Kahle --formerly of the Thinking
>  Machines Corp., and a father of WAIS-- one of the target functions
>  of Alexa-derived software is to be a `"reliability service" that
>  will resurrect dead links.  Give the URL and an approximate date
>  to the Archive, and it will dig up the document.'.....  rings a
>  bell, doesn't it?
>  The Alexa archives are made of successive sweep-n-suck (BIIIG
>  sucks, too) sessions of the entire WWW dataspace resulting in
>  consecutive "frozen Webs" stored at one location -- currently
>  a warehouse in SF; ultimately in the digital storage facility of
>  the US National Archives in Washington, D.C.  Treating an entire
>  docuverse as a collection of "barts" (or "stamps", I keep mixing
>  them up) may sound like a bit of overkill, but whoever said that
>  the (yellow brick) road to Xanadu must be straight and narrow?
>Based on Paul Bissex' article at:
>>           [...] whereas keyword search engines [AltaVista etc]
>>           store an index to the Web, the Archive consists of a
>>           copy of the Web itself. Kahle estimates the current
>>           size of the Web at about two terabytes (that's two
>>           million megabytes). Having completed two full sweeps
>>           of the Web, the Archive now contains about four
>>           terabytes of data. A recent upgrade of the Archive's
>>           connection from two T1 lines to a full T3 brings
>>           a welcome 15-fold increase in bandwidth, meaning
>>           that future Web "snapshots" will be conducted much
>>           faster than the first two. With some researchers
>>           estimating the average life of a Web page at 75 days,
>>           speed matters.

Edward Cherlin       Help outlaw Spam     Everything should be made
Vice President      as simple as possible,
NewbieNet, Inc.  1000 members and counting      __but no simpler__.    17 May 97   Attributed to Albert Einstein