TopNode status report

percival@bronze.ucs.indiana.edu Fri, 13 November 1992 22:52 UTC

Message-Id: <9211132026.AA29812@kona.cc.mcgill.ca>
Date: Fri, 13 Nov 1992 15:34:32 -0500
To: pacs-l@uhupvm1.bitnet, cni-directories@cni.org, nir@cc.mcgill.ca
Sender: ietf-archive-request@IETF.CNRI.Reston.VA.US
From: percival@bronze.ucs.indiana.edu
Subject: TopNode status report

TopNode Status Report


Background

TopNode is a project initiated by the Coalition for Networked Information
(CNI) to produce a directory of directories of network information sources.
CNI issued a  Call for Statement of Interest and Experience for the TopNode
project in November '91. In March '92, Indiana University and Merit Network
Inc. were selected to lead the TopNode effort. 

At Indiana University, the TopNode project in a collaborative effort
between University Computing Services, the Indiana University Libraries and
the School of Library and Information Science. Indiana's responsibilities
include collecting, cataloging, entering and maintaining the TopNode data.
In addition IU is assisting CNI with development of database applications
and coordination of the project. Merit's responsibilities include
identifying new network information sources, sharing this information, and
investigating mounting a X.500-based TopNode database. CNI's
responsibilities include developing database applications and project
coordination.


Goals for TopNode

1. Develop an end-user searchable database of network information sources,
services and resources.

2. Act as a clearinghouse for the registration of network information
resources.

3. Develop workable models for the data elements required to describe
network information resources.

4. Working with other groups, pursue distributed models for the entry,
management, maintenance and distribution of TopNode records. In particular,
we are interested in tracking some recent developments that are coming out
of the Internet Engineering Task Force (IETF).

5. Serve as a source of network information records that may be distributed
in a variety of formats and mounted using a variety of information
retrieval technologies.


Scope of the TopNode Project

The TopNode project, as originally envisioned by the CNI Working Group on
Directories and Resource Information Services, was to create a top level
directory of directories of network information sources - a directory which
pointed to other directories, catalogs and lists of network information
sources and resources. Early on, the TopNode implementation team decided to
extend the scope of the TopNode project beyond just providing information
about other directories, catalogs and lists, to including information about
individual information sources and resources. While early data collection
has focused on information about directories, the group felt that the data
element definitions that were put in place should be able to accommodate
information about individual resources.


Data Element Definitions

Two conflicting concerns guided our thinking when defining the data
elements for the TopNode database. One was to keep it as simple as
possible. Self registration, where individuals or organization offering
network information or services register their service by filling out a
resource template will be an important source of information for the
TopNode database. We felt that the simpler the registration templates, the
more likely a self registration process would be successful. Thus,
simplicity was an important concern as we defined the data elements for the
database.

Our other concern was that in the long term, we would like
workstation-based navigation applications to be able to make use of the
information in the TopNode database. These workstation clients would allow
users to search the database and would then negotiate a connection to a
service or automatically retrieve information on behalf of the user. As a
result, data elements concerned with navigation/connection information have
been broken out into individual fields, increasing the complexity of the
information which must be supplied for each resource. We felt, however,
that by separating this information into individual fields, we would avoid
the need to parse this information out of larger text fields in the future.



Different Templates for Different Types of Resources

The data elements necessary to fully describe an information resource or
service differ depending how the resource is accessed. For example, when
describing a file available at an anonymous FTP site, a field describing
the complete path to the file must be included. File path information is,
however,  irrelevant when describing a resource which is accessed via
Telnet or when describing an electronic journal. Because different
descriptors are required based on how a resource is accessed, separate data
element templates were created for different resource types or access
methods. We chose having separate templates rather than attempting to
create a universal (and lengthy) template of data elements that could be
applied to any network information resource. By creating separate
templates, only data elements that are relevant to a specific resource type
are present on any given template.

In each template, there is a set of common data elements which can
generally be applied to any resource type (e.g. title, contact person,
etc), followed by a set of elements that are specific to the particular
resource type. Currently, templates have been developed for the following
resource types: FTP accessible resources, resources accessible via terminal
emulation (Telnet & TN3270), e-mail accessible resources (e.g. e-journals
and listservers),  conventional print resources (those concerned with
networks and network information retrieval) and service centers (e.g. NICS,
NOCS, supercomputer centers, etc). Additional templates for other resource
types such as WAIS and Gopher servers are planned. The specific data
elements currently defined for each of these templates will be presented in
a follow-on document.


The TopNode Database

The TopNode database is being developed using BRS/Search software on CNI's
server. BRS is an extremely flexible text retrieval tool and particularly
good at outputing records in a variety of formats; a feature which will be
essential for unloading the TopNode records for use by other applications.
The TopNode database is currently made up of five databases corresponding
to the templates described above (FTP, terminal emulation, e-mail, print,
and service centers). For searching purposes, these databases have been
logically concatenated into a single TopNode database. We currently have
approximately 360 sample records loaded in the database.


What's Next

1. Wide spread discussion of our general approach and the specific data
elements currently defined.

The first task in developing the TopNode database was to decide on the data
elements we felt were necessary to describe network information resources.
In order to get this project moving, the TopNode project team made some
initial decisions about the set of data elements, realizing that it would
be impossible to get started quickly if we attempted to get consensus from
all interested parties on the network. It is time, however, to take
advantage of the considerable expertise that exists in this area and
solicit wide spread review of the data elements we have defined.

2. An analysis of the appropriateness of the current data elements in light
of the sample records we have entered.

In looking at the 360 records currently in the TopNode database it is clear
that while some of the data elements we defined may be theoretically
appropriate, they may not be practical. We are now in the process of
examining these records to see where changes are appropriate in light of
the actual data we have collected.

3. The completion of an end-user application for searching the TopNode
database.

We are in the process of developing an end-user BRS application on CNI's
server. At the Fall CNI meeting we will demonstrate an early version of
this application. In the coming months we will be refining this end-user
application.

4. Develop the applications required to dump the database in a variety of
formats. Establish a distribution methodology. 

We will be developing applications that will allow the TopNode database to
be unloaded in a variety of formats. It is unrealistic to think that the
Coalition's server has the capacity to support access to the TopNode
database by the internet as a whole. We will be looking to unload the
TopNode records in other formats so that the information can be delivered
using other information retrieval technologies (e.g. WAIS, perhaps MARC
records that can be loaded into local online library catalogs). Part of
this process will be establishing procedures that allow timely updates.


5. Develop the self-registration process. Invite authors who currently
maintain directories to participate.

Many individuals have taken it upon themselves to maintain directories of
network information sources (e.g. library catalogs, e-journals,
listservers, CWISes, printed publications dealing with network information
retrieval issues, etc). We would like to take advantage of this effort by
working with these authors to make this information available via the
TopNode database. In addition, other individuals and organizations are
mounting new network accessible resources on a daily basis. We will be
establishing a self-registration process which will allow information about
these new network information resources to become part of the TopNode
database.


You may direct any questions concerning TopNode to myself or to Craig
Summerhill at CNI (craig@cni.org).


Pete Percival
University Computing Services
Indiana University
percival@indiana.edu

TopNode status report percival