Re: proposal for subjects in subject tree

Alan Emtage <bajan@bunyip.com> Wed, 09 December 1992 04:43 UTC

Received: from ietf.nri.reston.va.us by IETF.CNRI.Reston.VA.US id aa12460; 8 Dec 92 23:43 EST
Received: from CNRI.RESTON.VA.US by IETF.CNRI.Reston.VA.US id aa12456; 8 Dec 92 23:43 EST
Received: from kona.CC.McGill.CA by CNRI.Reston.VA.US id aa11615; 8 Dec 92 23:45 EST
Received: by kona.cc.mcgill.ca (5.65a/IDA-1.4.2b/CC-Guru-2b) id AA12928 on Tue, 8 Dec 92 23:08:15 -0500
Received: from mocha.CC.McGill.CA by kona.cc.mcgill.ca with SMTP (5.65a/IDA-1.4.2b/CC-Guru-2b) id AA12914 (mail destined for /usr/lib/sendmail -odq -oi -furi-request uri-out) on Tue, 8 Dec 92 23:07:05 -0500
Received: by mocha.cc.mcgill.ca (4.1/SMI-4.1) id AA02953; Tue, 8 Dec 92 23:06:56 EST
Message-Id: <9212090406.AA02953@mocha.cc.mcgill.ca>
Sender: ietf-archive-request@IETF.CNRI.Reston.VA.US
From: Alan Emtage <bajan@bunyip.com>
Date: Tue, 8 Dec 1992 23:06:55 -0500
X-Mailer: Mail User's Shell (7.2.3 5/22/91)
To: nir@bunyip.com
Subject: Re: proposal for subjects in subject tree
Cc: eurogopher@ebone.net, gvl@unt.edu, iafa@bunyip.com, jkrey@isi.edu, uri@bunyip.com, Anders Gillner <awg@sunet.se>

Hello All,
	A little while back I promised various people to write up what I
would call "personal musings" from my point of view of what is currently
happening with the Internet, its information systems and the "technical"
and library communities. This is intended to focus my own thoughts as
well as perhaps provide a set of targets for people to shoot at: in a
sense a true "Request for Comments". Some of the writing gets off the
beaten track, but I trust you will bear with me on this. I will also warn
you that this is rather lengthy so you might want to grab a cup of coffee
before trudging through my diatribe.

I also have to say that I have no formal training in library sciences
(I'm a comp sci person), but over the past while have become increasingly
involved in the role of information services as well as interacting with
the librarians.

I would appreciate any feedback on this. 


-Alan



			Alan's Musings
			--------------

A fundamental shift has taken place in the Internet and Library
communities in the past two years or so. The technical communities (the
"techies") after spending many years building the Internet infrastructure
are now being asked (and are asking themselves) "So, now what do we _do_
with the darned thing?". The fact that services like archie, WAIS, W3,
Gopher and the rest came into existence at about the same time is proof
once again that necessity is the mother of invention. At about the same
time, the library community "woke up" to the Internet and the potential
that it held. I certainly don't mean to imply here that there have not
been groups in both camps who have recognized the situation for a while,
but there came a point where the critical mass was reached and the larger
community got the collective "Aha!".

Please note that much of this has a particularly North American flavor. I
haven't had much contact with the librarian groups in either Europe or
Asia but hope to do so in the coming months. Some of the points made
below may seem obvious.... let me say that I have had enough interactions
with both communities to realize that this is not always so. Also, I'm
painting with a very broad brush here... please don't take these comments
personally.

As a minimum I see the following points as creating an essential
foundation if we in the two communities are to work together and provide
the general public with systems that meet their needs for this decade and
beyond.

Point 1: Getting together
-------------------------

First and foremost the stereotypes have to be broken down. While this
smells a bit like social engineering, it is a problem that needs to be
recognized and addressed. Without trying to sound silly here, it is still
partially true to say that the techies (and many people in general) think
of your average librarian as a little lady with a _really_ tight hairdo,
pursed lips and cat-eyed glasses sitting around in dimly lit rooms
reeking of moldy paper.  Similarly, the "Poindexter" (sorry to
non-Americans here for the culture-specific reference :-),
pocket-protector-and-sliderule image comes to mind when the general
public thinks of techies. These stereotypes get in the way of
understanding what each of us does and what knowledge, expertize and
experience each group brings to the table. 

Contacts between the two communities have to be formalized, strengthened
and, perhaps most importantly at this juncture _funded_. It is nice when
the two groups have conferences in the same city at the same time, and as
welcomed as that kind of "cross-pollenation" is it is simply no longer
good enough as it depends on the motivated interest of individuals not
institutional will. The organizations in both communities need to
establish liaison groups, lines of communication (like common mailing
lists) and ultimately, face-to-face gatherings in which both feel welcomed
and accepted. 

We all have to realize that neither group has the financial means nor
expertize to do this on their own. We really do _need_ one another.


As with any meeting of the minds boths groups have to understand the
needs, fears and expectations of the other.


Point 2: Getting it right vs. Getting it now
--------------------------------------------

We techies need to realize that the community that the library serves are
hungry for something right now, not tomorrow or the day after. The users
have an idea of what's out there and they want to know how to find what
they are looking for. This of course is also true of the Internet
environment.  However, the traditional Internet user was a lot more
likely to put up with the indignities of the FTP client interface and the
word of mouth method of information dissemination than your average high
school student is today.

This came home to me recently when I had the opportunity to meet with the
Library of Congress MARC group to provide informal feedback on the
upcoming proposed changes to the MARC record. I have to admit that my
first thoughts to many of the proposed fields were "You can't do it that
way... this information is inherently dynamic and you can't store dynamic
information in static records". We techies often have an aversion to what
_we_ view as "bandaid solutions" due to the knowledge that the "temporary
fix" may be around for generations to come. Rather than getting something
out there which works now but isn't a long term solution, we want to sit
down and get it right: how many of us have cursed the designer of bathroom
taps where the cold and hot water spigots don't mix the water before it
gets to you...  you alternately get your hands frozen or scalded. Could
the person who designed it just sat down for a couple minutes more and
said "Hmmm, maybe I should think about this a bit before I unleash it to
the unwashed masses?"  :-) 

We have to understand that for the users that don't have access to _any_
of this information now, something is better than nothing even if it
means that the information may be out of date when it gets to them.

I find myself now coming down somewhere in between the two camps: give
the users the data and make it as accurate as you can at the time, but
make sure that they realize that this information may be out of date and
at best they must think of it as a hint. In the meantime, the long
process of designing technical solutions to providing a higher quality of
service can be worked on. It doesn't help your student looking for a
particular piece of information on the network to say "We'd love to help
you with that, but we'd like for you to get it right all the time... and
by the way, this will be ready in 1999".

The problems that we are attempting to tackle are enormous in scope and
while I have no doubt that they will ultimately be solved, it will take a
lot of talk, work and time.

Point 3: The Spectrum of Quality
--------------------------------

Librarians have long realized that there is no chance that they could
catalogue every piece of information that is made available to them.
There are possibly millions of publications each year which are never
proffered for cataloging and even amongst those which are, there are a
significant number which are not considered "worthy" of the time
consuming and expensive process.

Whether we like it or not (and I do) we "ain't seen nothin' yet". It is
not necessary or even desirable to try and catalogue every (or maybe not
even a large percentage) of the information which is, and will be,
published (made available for public or semi-public access) over the next
say, ten years. However, I have seen comments in the past suggesting that
certain members of the library _and_ technical communities believe that
there is some element of "control" still possible in the process. I would
like to put their mind at ease: We are currently looking at an Internet
which claims on the order of 10 million users and is growing
exponentially and I have seen estimates up to 300 Million for 1995. We can
view each and every one of these people as potential "publishers" of
information. There is no way that we can even begin to think about trying
to control this process. Why would we want to?

The local corner store does not employ a professional librarian to
organize the magazines on their racks. When we go into the store, we do
not expect a reference desk: we browse through the rack and see if there
is anything interesting and we may or may not find what we are looking
for (if in fact we were looking for something specific to begin with).
However, most of us aren't devastated if we don't find what we are
looking for.... we go on to another store or just give up for the time
being. We realize that we get what we are paying for: we probably
wouldn't pay for the corner-store-cum-librarian.

On the other hand, many of those same magazines are available at
your local public library. In this context we expect a different level of
service... one where the magazines are organized by subject and where we
can probably find back-issues for several years. Many of us are also
aware (even in the public libraries) that significant amounts of money are
being invested to keep this infrastructure in place.

I trust I don't have to beat this analogy to death :-) The network will
consist of "corner stores" and "libraries" any many things in between,
much as our day to day life does now... and that is good. If your corner
store consistently does not provide the magazines and other services
that you want you'll find another store. If enough people in your
neighborhood (read: "community") feel this way the corner store will go
out of business, hopefully replaced by one which is more sensitive to
your needs. We expect more consistency from the professional information
providers but in a for-fee world (you have to pay to use the library)
exactly the same market forces will be at work. If you don't feel that
you are getting your money's worth, you'll take your business somewhere
else and that private library will go out of business.... and that too is
good. There will be a Spectrum of Quality provided from the ridiculous to
the sublime.

My belief is that we are going to see the entire range of both the
quality of the basic data served and the services provided to enable the
user to locate and retrieve the desired information. Some of it is going
to be on a for-fee basis and I see no reason not to believe that much of
it will also be freely available, driven by much the same forces as now
provide for the "free" information currently available on the Internet. I
envision the work of the professional librarians as concentrating on
certain areas of this enormous collection of information and doing what
they now do so well: provide a professionally organized and maintained
structure in which their users can quickly and easily locate those
particular pieces of information of interest.


Point 4: Fostering the User Communities
---------------------------------------

Stating the obvious: Communities spontaneously arise from every domain of
human activity, whether this is the "community" of your close friends,
your local neighbourhood, your football team or your nation. In the most
general sense these communities are bound together by common interests and
goals. We see this right here in the mailing list or newsgroup you are
reading this on: your interests lie in the domain of "Information on the
Network". We are all members of innumerable communities simultaneously,
each meeting different needs.

I believe that this will be a very powerful and useful force as we start
thinking about the kind of services we will be providing into the
foreseeable future. Those of us interested in for example, tropical fish
will naturally gravitate to other people with the same interest. It is
our job as the architects of the new information infrastructure (what I
call the "infostructure") to allow our users to locate other members of
their communities and the information which is important to that
community. It also means that we need to create systems which allow one
to "import" the view of that community on a wholesale level such that
each user on entering the system need not have to rebuild the structure
that defines that community.

Point 5: World Views
--------------------

In the past I have said that I have become less and less concerned about
the particular scheme used to catalogue information resources.  As far as
I can tell since Linnaeus (and of course before, but he probably was the
first to use what we now term as the "modern scientific method"), most
classification schemes have been based on the hierarchical model. It is
so common that most of us don't think about it: when we go to the
supermarket and we are looking for ice cream we don't go looking in the
personal hygiene section.... we go into frozen foods. Nature also seems
to work in this way (or at least the way the human brain perceives
"nature"): we have the idea of "dog" and when we recognize those qualities
of "dogness" in an animal we have just encountered we tend to think
"related to dog" or "subdog" :-) Of course we also recognize that there
are some objects that don't always fit as nicely as our intuition would
have us believe: when we see the panda "bear" we think "bear" not
"raccoon" which the taxonomists would tell us is the correct answer. [Ask
any taxonomist about the "guinea pig" (or "cavy" to our British friends)
and watch them run away screaming ;-), but I digress....].

[I can hear you saying "but what's the point?" :-] The point here is that
I find it difficult to imagine a situation where some group could
construct a classification scheme in which an adequate (and in many cases
a one-to-one) mapping could not be performed between it any any other
scheme. However, this is not my field of expertize and I would welcome
other insights into this. However I would ask that you take a look at the
bigger picture before turning up the heat. We are working in the most
malleable and versatile environment ever created.  It is trivial for a
computer to transform the number 203.3.4 (BTW, I have no idea if this is
a valid number or what if anything it refers to) in the Dewey Decimal
system to whatever text description or other taxonomical system you are
working in provided the appropriate infostructure exists. These
transformations should occur at the level of the client/server or
client/system not at the user/client boundary. 

It is often useful to adopt a scheme already in existence however, I
contend that it is not worth the effort to radically warp a pre-existing
system to your own use if it does not naturally do so. It is important
that your primary user base be comfortable with the system you use. If
other communities feel that it would be useful for them to have access to
your system, but they work under a different scheme then it is in their
interest to perform the conversion.

Each of us as individuals or members of a community will construct our
own paradigm. I believe that any given community will create and use
their own world view. Part of our job is to provide the user with a
paradigm that is both comfortable and easy for them to use. For example,
many of the thousands of gopher users of the world have become accustomed
to interacting at the information services on the Internet in the "gopher
way". Much of this information is native to the gopher system itself,
however there are also parts of the system which reside in other areas of
the infostructure: one can do WAIS and archie searches without ever
knowing that the information has been imported from a completely
different system. The work with URI's et al.  are the first pass at
facilitating such inter-operability.

People have on occasion asked me which information system will "win" ?. My
answer is that you're asking the wrong question. Each of the existing
information services addresses a component of what is a very large and
complex problem and each provide a paradigm useful to some community. It
is my honest hope that in 5 years time the user will never have to hear
the words "archie" or "WAIS" or "WWW". They shouldn't and don't need to
know that these "low-level" systems even exist. What they need to be able
to do is choose from a set of paradigms, pick the one which they are most
comfortable with and use it. The client that needs to contact inter-system
gateways or perform data transformations should do so in a transparent
and seamless manner. We are certainly not at the point where one
monolithic information service could provide us with the services we need.

When I have said in the past that we are working with a "clean slate", I
do not mean that we ignore the literally thousands of years of work that
has already gone into this (the ancient Greeks had libraries). What I
_do_ mean is that we are working with a virtual reality (an oxymoron if
I've ever heard one :-) which allows us to build on the old ways of
without being restricted to their limitations.

This brings be to my final (yeah!) point:

Point 6: Some of the Questions
------------------------------

We are only now starting to grasp the meanings of the questions, far less
the answers.

a) The question of the "data elements" has come under the spotlight of
late in both communities: what attributes of the diverse sources of
information can we all agree on to provide the foundations for the kinds
of services described above? 

b) What are the systems that we need and how do they inter-operate? 

c) How and where do we provide the privacy and security required for a
ubiquitous infostructure ?

d) How do we shield the user from the ugly underbelly?

e) HOW DO WE FUND ALL OF THIS ?


If you have gotten this far, I hope you have not been bored by my
musings. This of course is not intended to be an exhaustive essay (it may
have been exhausting however :-), and I trust that we'll be able to
discuss some of the points raised here.