Essence prototype announcement
Mike Schwartz <schwartz@latour.cs.colorado.edu> Wed, 13 January 1993 21:35 UTC
Received: from ietf.nri.reston.va.us by IETF.CNRI.Reston.VA.US id aa15048; 13 Jan 93 16:35 EST
Received: from CNRI.RESTON.VA.US by IETF.CNRI.Reston.VA.US id aa15044; 13 Jan 93 16:35 EST
Received: from kona.CC.McGill.CA by CNRI.Reston.VA.US id aa12927; 13 Jan 93 16:36 EST
Received: by kona.cc.mcgill.ca (5.65a/IDA-1.4.2b/CC-Guru-2b) id AA04437 on Wed, 13 Jan 93 12:49:59 -0500
Received: from latour.cs.colorado.edu by kona.cc.mcgill.ca with SMTP (5.65a/IDA-1.4.2b/CC-Guru-2b) id AA04427 (mail destined for /usr/lib/sendmail -odq -oi -fiafa-request iafa-out) on Wed, 13 Jan 93 12:49:27 -0500
Received: by latour.cs.colorado.edu id AA25471 (5.65c/IDA-1.4.4 for iafa@cc.mcgill.ca); Wed, 13 Jan 1993 10:46:56 -0700
Date: Wed, 13 Jan 1993 10:46:56 -0700
Sender: ietf-archive-request@IETF.CNRI.Reston.VA.US
From: Mike Schwartz <schwartz@latour.cs.colorado.edu>
Message-Id: <199301131746.AA25471@latour.cs.colorado.edu>
To: Essence-announcement-list@latour.cs.colorado.edu
Subject: Essence prototype announcement
Essence is a resource discovery system that exploits file semantics to index both textual and binary files. Essence generates summaries that can be used to browse files before retrieving them across slow network links, as well as space efficient indexes. Essence understands nested file structures (such as uuencoded, compressed, "tar" files), and recursively unravels such files to generate summaries for them. These features allow Essence to be used in a number of useful settings, such as anonymous FTP archives. The prototype generates WAIS-compatible indexes, allowing WAIS users to take advantage of the Essence indexing methods. WAIS users can try Essence using the ".src" file enclosed below. This file also describes where to get the prototype source code and a paper about this system. Darren Hardy and Michael Schwartz Dept. of Computer Science Univ. of Colorado - Boulder ------------------------------------------------------------------------------- (:source :version 3 :ip-address "128.138.243.151" :ip-name "ftp.cs.colorado.edu" :tcp-port 8000 :database-name "aftp-cs-colorado-edu" :cost 0.00 :cost-unit :free :maintainer "hardy@cs.colorado.edu" :description "You can use this WAIS server to search and retrieve files from the anonymous ftp archive on ftp.cs.colorado.edu [128.138.243.151]. We used Essence, a resource discovery system based on semantic file indexing, to build the WAIS index for this server. As explained below, Essence currently only allows the retrieval of file summaries through WAIS. To retrieve entire files, use anonymous ftp on ftp.cs.colorado.edu. Essence exploits file semantics to index both textual and binary files. By exploiting semantics, Essence extracts keywords that summarize a file, and generates a compact yet representative index. Essence understands nested file structures (such as uuencoded, compressed, ``tar'' files), and recursively unravels such files to generate summaries for them. Essence generates indexes that are ten times smaller than WAIS indexes, but retain the fine-grained information access that WAIS's full-text indexes provide. Furthermore, Essence generates WAIS-compatible indexes allowing WAIS users to make use of Essence's indexing capabilities. This is one of the ways that the Networked Resource Discovery Project at the University of Colorado has extended the conceptual paradigm of the type of information that WAIS handles. If you would like to learn more about Essence, you can obtain the source to the Essence prototype and a paper which appears in the 1993 Winter USENIX Technical Conference, San Diego, CA, January 1993, pp. 361-374. Both the paper and the prototype are available via anonymous ftp from ftp.cs.colorado.edu in /pub/cs/distribs/essence. Or search for the keyword 'Essence' using this WAIS server to find all of the files on ftp.cs.colorado.edu that are related to Essence; you will find the files for both the paper and the prototype. This WAIS server was created in December 1992 by Darren R. Hardy and Michael F. Schwartz as part of the Networked Resource Discovery Project. You may reach them at the Department of Computer Science, University of Colorado, Boulder, CO 80309-0430, or via email at hardy@cs.colorado.edu and schwartz@cs.colorado.edu. Below is some more information about the WAIS interface to Essence. Essence exports its indexes through WAIS's search and retrieval interface, allowing users to use tools such as waissearch and the X Windows-based graphical user interface xwais. In order to generate WAIS-compatible indexes, Essence uses WAIS's indexing software to index the Essence summary files. This mechanism generates full-text WAIS indexes from the Essence summary files. We modified the WAIS indexing mechanism to understand the format of the Essence summary files, so that it generates meaningful WAIS headlines. These headlines provide users with a short description of a single file, usually a filename. With Essence, headlines represent a file's core filename, its actual filename, and its file type. To support additional file types, WAIS must be recompiled with new procedures that understand these file types. With Essence, one need only write a new summarizer, add its name to a configuration file, and add new heuristics for identifying the file type; no recompilation is necessary. In this sense, Essence modularizes the typed-file indexing extensions that WAIS can use, because it removes the keyword extraction process from WAIS and places it instead in Essence. Essence is better suited to incorporating new file types, and can be quickly adapted to become a comprehensive indexing system. The following waissearch output shows an example search of an index generated by Essence of the ftp.cs.colorado.edu anonymous FTP file system. It shows an ordered list of the ten files that best match the keyword netfind. Netfind is an Internet user directory service. The headlines have up to three fields representing the matching file: the core filename, the filename (if different from the core filename), and the file type. ------------------------------------------------------------ csh% waissearch netfind 1: /cs/ftp/techreports/schwartz/PostScript/Techniques.Wide.Area.ps.Z Techniques.Wide.Area.ps PostScript 2: /cs/ftp/techreports/schwartz/PostScript/ALL.PS.tar.Z PostScript/Techniques.Wide.Area.ps PostScript 3: /cs/ftp/distribs/netfind/netfind3.10.tar.Z ServerShell/nsh.c C 4: /cs/ftp/distribs/netfind/README README 5: /cs/ftp/distribs/netfind/netfind3.10.tar.Z README README 6: /cs/ftp/distribs/netfind/netfind3.10.tar.Z Doc/netfind.1 ManPage 7: /cs/ftp/techreports/schwartz/PostScript/Proj.Overview.ps.Z Proj.Overview.ps PostScript 8: /cs/ftp/techreports/schwartz/PostScript/RD.Comparison.ps.Z RD.Comparison.ps PostScript 9: /cs/ftp/techreports/schwartz/PostScript/ALL.PS.tar.Z PostScript/Proj.Overview.ps PostScript 10: /cs/ftp/techreports/schwartz/PostScript/ALL.PS.tar.Z PostScript/RD.Comparison.ps PostScript csh% ------------------------------------------------------------ Consider the effectiveness of the example search shown above. The best match is a PostScript paper that discusses a number of techniques for distributed information systems, with particular emphasis on techniques demonstrated by Netfind; the second match is the same file, but found in the compressed tar distribution ALL.PS.tar.Z. The third match is the C source code for the interactive user interface to Netfind. The fourth match is the README file found in the Netfind distribution directory; the fifth match is the same file, but found in the compressed tar distribution netfind.3.10.tar.Z. The sixth match is the UNIX manual page for Netfind. The remaining matches are PostScript papers in which Netfind is discussed. In WAIS, a user retrieves files by selecting a matching headline. With Essence, if the headline represents a file hidden within a nested file (such as the first headline in the example), the summary file is retrieved, instead of retrieving the hidden file itself. If the headline represents a plain file (such as the fourth headline in the example), the summary file is also retrieved. This functionality requires allocating storage for both the required summary files and the index. However, it allows users to browse through remote file systems by retrieving and viewing small summary files without having to retrieve complete files. This is useful when trying to decide whether to transfer large files across a slow network. " )
- Essence prototype announcement Mike Schwartz