Part IETF trip report: networked information retrieval
Jill.Foster@newcastle.ac.uk Fri, 10 April 1992 16:31 UTC
Received: from nri.nri.reston.va.us by ietf.NRI.Reston.VA.US id aa01755; 10 Apr 92 12:31 EDT
Received: from nri.reston.va.us by NRI.Reston.VA.US id aa17919; 10 Apr 92 12:35 EDT
Received: from bells.cs.ucl.ac.uk by NRI.Reston.VA.US id aa17880; 10 Apr 92 12:34 EDT
Via: bells.cs.ucl.ac.uk; Fri, 10 Apr 1992 15:26:18 +0100
Received: from newcastle.ac.uk by eros.uknet.ac.uk via JANET with NIFTP (PP) id <28337-1@eros.uknet.ac.uk>; Fri, 10 Apr 1992 15:06:53 +0100
Received: from uk.ac.ncl.mts by ncl.ac.uk; Fri, 10 Apr 92 14:33:57 +0100
Date: Fri, 10 Apr 1992 14:33:09 +0100
From: Jill.Foster@newcastle.ac.uk
Subject: Part IETF trip report: networked information retrieval
To: osi-ds@cs.ucl.ac.uk
Message-Id: <emu-ov07.1992.0410.143309.cl54@uk.ac.ncl.mts>
The following is an extract from a trip report on the IETF (Internet Engineering Task Force). These notes cover ONLY the discussions on networked information retrieval. The full report may be obtained from mailbase (See below). Note from report: The following informal report is in note form and deals mainly with the areas of User Support and Networked Information Retrieval. Whilst it is as accurate as I can make it, it is naturally a personal account and may be inaccurate due to lack of background information or misinterpretation of what I heard. Corrections of fact are welcome, but any discussion of items contained here would be best directed to the appropriate mailing lists. (In particular the nir mailing list mentioned below which is now operational). This report will be stored on the UK Mailbase Server. To retrieve a copy, email to Mailbase@mailbase.ac.uk with the following command in the body of the message: send rare-wg3-usis ietf.03.92 The sections on networked information retrieval follow. Jill Foster - Newcastle University, UK Chairman: RARE WG3 USIS Subgroup Networked Information Retrieval =============================== This was discussed in the following groups: IAFA (Internet Anonymous FTP archives), Living Documents BOF and WAIS/X.500 BOF. Each and every network user has the possibility of publishing information widely on the network. As the Internet grows rapidly, the problems of resource discovery and networked information search and retrieval increase daily. Several groups have (initially) independently tried to tackle some of the problems. One of the major attractions of this IETF (from my point of view) was that many of the major players in the NIR arena would be in attendance and that two BOFs (Living Documents and X.500/WAIS) were being held to discuss various aspects of NIR. The groups concerned included: Archie people: Peter Deutsch and Alan Emtage World Wide Web: Tim Berners-Lee Prospero: Cliff Neuman Gopher people X.500 group: Steve Kille, Paul Barker, Wengijk Yeong as well as representatives from CNI architectures group: Clifford Lynch Leading up to the BOFs there were several informal sessions over lunch and dinner and in the terminal room. Living Documents BOF ==================== The Living Documents BOF was originally intended to address the problem of managing documents that are continually updated (such as the NOC-tools RFC, the user-bibliography, user-glossary etc). However it developed (as expected) into a wide ranging discussion and brain storming session on the problems of resource discovery and information retrieval. There had been long discussions on a number of mailing lists leading up to the IETF. Peter Deutsch had proposed a UDSN (Universal Document Serial Number) which should be the equivalent of the ISBN for books. This would be a contents ID or fingerprint and would enable several instances of the same information to be recognised as being equivalent. There was discussion on what constituted equivalence rather than a derived work. Were postscript and ascii versions of the same file equivalent? (Most thought yes). But what if the postscript versions contained diagrams or graphics not in the ascii version. (What if it was translated into another language? etc.....) For each "document" there was a need for: o Catalogue information (Title, author, creation date etc.) o Location and access information Also required: o USDN o UDI (Universal Document Identifier (See later)) o Authentication and access control o Version control o Editorial control o Discovery mechanisms o Ability for information providers to publish/ announce items/document One possible USDN would be a MARC record, however there are several standards here (US (several) UK ...etc.) Clifford Lynch (CNI architectures group) felt that use of MARC was not really appropriate here in any case. Amongst other problems discussed was the need to refer to bits of documents. However this discussion was shelved as the problem of dealing with complete documents should be addressed first. There is a real need for librarians to bring their expertise to these issues. The Coalition of Networked Information (CNI) is working on doing just that. There is a short term need to be able to determine whether two documents are the same (UDSN) and the need to have a top level globally unique name to refer to one instance of a document (UDI). It was agreed to set up an nir discussion list nir@cc.mcgill.ca to discuss these issues further. IAFA: Internet Anonymous File Archives Working Group ==================================================== A document had appeared shortly before the IETF. Briefly it detailed how information about the files in a public file archive could be made available. The current problem is that tools such as Archie are not able to discover automatically detailed information about a file (apart from its name). The proposal is to have information about the file archive and a "file catalogue card" containing various attributes of the file (including keywords and a description or abstract) available as a separate file either in the same directory as the file or in a shadow directory. The various attributes to be included on this catalogue card were discussed and the paper will be updated in the light of this. I mentioned the Draft RFC from the OSI-DS WG on "Representing Public Archives in the Directory", and recommended that the attributes required for registration in the directory should be included in the IAFA Archive description file. I suggested the idea of a Quality of Service attribute. Some Services have a high availability and are run by professionals, other archives are run on a best endeavour basis by volunteers. A further suggestion was the need to be able to register logical archives. That is separate archives that happen to reside on the same machine. X.500/WAIS BOF ============== This was really a companion BOF to the "Living Documents" BOF which had seen a wide ranging discussion on networked information retrieval. In contrast this BOF was more structured and started with presentations on the various applications. There is a need to have some sort of Universal Document Identifier that could be used by the various applications. WAIS: ==== John Curran provided a short description of WAIS. (Unfortunately no one from "Thinking Machines" was able to attend the IETF). However John had a reasonably good knowledge and experience of the application (NNSC have a WAIS interface to the RFCs). _______________ _______________ _______________ | | | | | | | | | | | Files of | | | | WAIS | | Information | | Client | --->----- | Server | | (e.g. RFCs) | | | | | | | | | | | | | |_______________| |_______________| |_______________| The WAIS Server has an inverted index of all the words in a document which is pre-built. (This does not make sense for non-text files of course). It also holds other information about the document (size etc). A client will formulate a query on behalf of the user and send it to the WAIS Server which will search the index and retrieve and return the document using the same protocol (Z39.50). Use of a pre-built index makes this very fast. One WAIS Server may have multiple sources (and multiple indexes). There are various WAIS Servers in existence, but there is currently no way of querying which Server is responsible for which source. The possibility of putting WAIS descriptor files on a Server or in an X.500 directory was discussed. Differences: Z39.50/WAIS: WAIS specifies how a query should be formulated (Z39.50 does not) WAIS uses Z39.50 (slightly modified) as the transport protocol. WAIS also provides relevance feedback. OSI-DS 22: ========= Wengiyk Yeong presented his draft RFC on representing a public archive in the directory. He also described a project using this. A file can be found using the directory and then automatically retrieved using the specified access method. World Wide Web: ============== Tim Berners-Lee gave a talk on the World Wide Web. This project has been funded to provide a service to the world wide community of high energy physicists. It is a hypertext system. The philosophy behind it is that a user should be able to point and click on an item name or a word within a document and the associated document would be retrieved from wherever in the world and presented to the user in an appropriate format - without the user having to be aware of where the document is located or what the access method is. These details are hidden in the hypertext links. There were server programs for many information servers, gateways to WAIS, Archie and gopher and client programs for various user machines. The overlap between WWW, WAIS, Archie, Prospero was indicated and the need for a UDI by all of these was discussed. Each application (apart from WAIS) uses a "handle" for a file which can be prefixed by something appropriate. WAIS currently can only have "WAIS" as the prefix. There is a need for it to be more flexible. Mailing lists: WWW-interest@nxoc01.cern.ch WWW-talk@nxoc01.cern.ch OSI-DS25: ======== Steve Kille discussed this paper "Representing the Real World in an X.500 Directory". A Listing Service may be used to group like information items together for example to provide a Yellow Pages Service. Could represent members of a special interest group. Group Documents on a particular subject. Services such as Archie could be considered to be Listing Services. One imagines an information Universe in which Information Brokers provide different subject based (say) views via their listing service. One would then need to locate the various listing services (using a mechanism such as a directory?) OSI-DS mailing list: osi-ds@cs.ucl.ac.uk Subscriptions: osi-ds-request@cs.ucl.ac.uk UK British Library Project: ========================== Paul Barker described a project, sponsored by the BL, to represent grey literature (unpublished research papers) in the Directory. The project is thought to be unlikely to succeed - but one of the aims is to demonstrate whether or not it is possible. They will take the (UK) MARC records and model these within X.500. They might also consider trying to provide a listing service so that the documents might be retrieved more readily by subject area. Prospero: ======== Cliff Neuman described Prospero. It follows a file system (rather than hypertext) model. It is built on UDP. It has the notion of a Directory which contains links to other objects (other directories or files). It returns the link to the information object and then automatically retrieves the file by another mechanism by the appropriate access method (Archie, WAIS, nntp, WWW - soon!, NFS, ftp etc.) It has linked very successfully with archie. Cliff stated that he expected to be able to use X.500 to translate between the document ID and how to get the document. With Prospero the user has his own view of the global information base (or has a view built for him). Cliff thought there should be multiple name spaces - but the difficulty would be that these would need representing near the top of the directory tree. With multiple user chosen views - this would be difficult to manage. Also two users might refer to an object by different handles which would be relative to their individual name spaces - difficult when passing references (say in a mail message) from one person to the other. Mailing list info-prospero@isi.edu System 33 ========= Larry Masinter talked about a project at XeroxParc. There was the concept of a - HANDLE 32 byte number (is a content ID) - FILE Location (6 part) Protocol; Host; Path; piece; format; timeout - Description (normal "Catlogue" information name: Author: etc. - Document There is format negotiation when a document is retrieved. Also considered Access Control. ACL is part of description. The Server exploits multiple protocols for Search and retrieve. There is a problem with dealing with different types of document: - applications for jobs - product specs. - memos - contracts - faxes - etc. It is difficult to normalise the attributes of a general document. Summing up ========== Tim Berners-Lee summed up by saying that all applications described had a need for a Unique Doc ID and for a name service for this. The UDI needed to be resolvable. (This is not the same as USDN - content ID - described earlier). There should be a WG on details of UDI (but this needs a better name) and a separate one for USDN (and the need for a single resolver for these). Chris Weider agreed to co-author a document on the issues. I suggested that it might be useful to try just doing this. That is to have a pilot-project to try putting UDI's in the directory for a set of files and to have the gopher, Prospero, archie, and Prospero people try to utilise these. Concluding Remarks ================== So all in all a very worthwhile meeting. The problems of NIR have been aired. The various players in the field have been made aware (if they were not) of the work of the others. Some plans for practical collaboration have already been formed. These issues will be discussed further at the Joint European Networking Conference in May, RARE WG3 USIS meetings, future IETF meetings and of course on the various mailing lists. Further links have also been made between the IETF User Services Area people and RARE WG3 USIS members, which will enhance collaboration. Finally, a reminder that these notes are my view of the IETF. They may not be an accurate view, and certainly do not cover the wide range of topics discussed at the workshop. Jill Foster 09.04.92