Document I
Alan Emtage <bajan@bunyip.com> Thu, 15 October 1992 02:03 UTC
Received: from ietf.nri.reston.va.us by IETF.NRI.Reston.VA.US id aa11301; 14 Oct 92 22:03 EDT
Received: from NRI.RESTON.VA.US by IETF.NRI.Reston.VA.US id aa11297; 14 Oct 92 22:02 EDT
Received: from kona.CC.McGill.CA by NRI.Reston.VA.US id aa25915; 14 Oct 92 22:03 EDT
Received: by kona.cc.mcgill.ca (5.65a/IDA-1.4.2b/CC-Guru-2b) id AA13188 on Wed, 14 Oct 92 17:33:25 -0400
Received: from mocha.CC.McGill.CA by kona.cc.mcgill.ca with SMTP (5.65a/IDA-1.4.2b/CC-Guru-2b) id AA13184 (mail destined for /usr/lib/sendmail -odq -oi -fiafa-request iafa-out) on Wed, 14 Oct 92 17:33:21 -0400
Received: by mocha.cc.mcgill.ca (4.1/SMI-4.1) id AA09092; Wed, 14 Oct 92 17:33:20 EDT
Message-Id: <9210142133.AA09092@mocha.cc.mcgill.ca>
Sender: ietf-archive-request@IETF.NRI.Reston.VA.US
From: Alan Emtage <bajan@bunyip.com>
Date: Wed, 14 Oct 1992 17:33:20 -0400
X-Mailer: Mail User's Shell (7.2.3 5/22/91)
To: iafa@bunyip.com
Subject: Document I
Hello All, I have modified the first of the IAFA documents to conform to the suggestions from the last IETF in Boston. However, I might not have gotten everything so suggestions are always welcome. Please note that the next IETF meeting is in just over a month, so please give us at least a week before the meeting to modify according to suggested changes. This document does _not_ deal with the templates: that is in the second document will follow in a few days. Please note that we still have not received any entries for operating systems other than UNIX which even come close to an adequate level of detail: You will notice that in addition to the actual configuration, subjects like security are handled in some depth. In order to add entries for other OS's (such as VMS, VM/CMS etc) we will need equivalent descriptions. Changes: A section on "Packaging & Delivery" has been added to this document. However, it may be more appropriate to place this in the User's Document. Give us your thoughts. The section on "ethics" and "illegal" practices such as storing copyrighted material without proper release, etc. has been strengthened. Information on how to "register" an anonymous FTP site has been added. However, it is not clear if this is sufficient as currently stands. The section on the use of the anon FTP site to share private information has been strengthened. There is an entry in the minutes that "more work needs to be done on the _Why you should run an archive_ section. However I don't recall exactly what this was all about. Maybe somebody who was there could remind me. To Do: The references still haven't been resolved. Here it is. A copy will be put on archives.cc.mcgill.ca in the pub/iafa directory later on tonight. Enjoy. -Alan ------------------------ IAFA-WG A Guide to FTP Site Administration (IAFA DOC I) DRAFT 92.10.14 Introduction ------------ As the growth of the Internet continues it has become fair to speak of an "Internet infostructure". Companion to the extensive physical infrastructure that is itself responsible for the specific routing and delivery of messages, the Internet infostructure is comprised of the growing body of information and the structure that supports it. Much of this infostructure is available equally to all users while other areas are available to anyone participating in specific network-based workgroups. The Internet acts as an enabling technology that makes available this wealth of information to those who know how to access it. In this document we concentrate on the remote file transfer model for sharing information in an Internet environment. On the net today this model is primarily implemented using the File Transfer Protocol (FTP) [1]. Available from most sites, an FTP service can provide a secure and reliable mechanism for copying specific files from one host to another across the network. In particular, we aim to provide information to anyone contemplating setting up or maintaining an Internet information archive using the facilities of FTP. A companion document provides specific recommendations on encoding various types of information to be offered by such a site. This document does not attempt to define specifically how the file system for an archive should be arranged since this depends upon the resources available, the information being distributed and the organizational structure of the groups administering the archive. We do offer general guidelines and some hints and recommendations. Site administrators setting up or running Anonymous FTP archives should also refer to RFC [* ???? *] "Publishing Information on the Internet with Anonymous FTP". This provides a detailed description of a standardized framework in which valuable additional information can be provided alongside the data that you distribute. Such additional information allows users and automated indexing tools to more easily search, locate and retrieve desired information from across the Internet. What is Anonymous FTP? ---------------------- The FTP service has been around since the early days of the Internet and it has been a successful service. According to statistics generated by the National Science Foundation (NSF), about 50% of current network traffic (by volume) on the NSFnet backbone in the United States is being used for this purpose [2]. The FTP system is designed around a client/server model. Users invoke an FTP client, which then connects to an FTP server process running on a remote host. The server is responsible for verifying the authenticity of the user and performing the operations requested by the user through the client, enforcing the security and integrity of the host system. In ordinary FTP sessions one would log into an account on the remote host from where one either wanted to retrieve the file or to which the file (or files) were to be placed. Commands allow the user to navigate through the remote file system and copy or delete files. This form of access requires one to have the name and password of the account on the remote system. Users of FTP normally must go through a login sequence when connecting to a foreign host. They are then allowed to copy those files to which they have been granted access permission. The login sequence provides basic authentication and security in an open systems environment with hundreds of thousands of interconnected hosts and millions of users. The underlying FTP and Internet communications protocols provide needed error checking and thus ensure the integrity of the transferred information. The basic FTP-based file sharing model has been extended through the creation of a network of universally accessible FTP archives sites. Information at such sites is available to all users of the Internet, without the usual authentication step using the convention of "anonymous FTP". Under this mechanism, site administrators make available a collection of files to the Internet community by creating a special "anonymous" user account. Such accounts either require no password, or accept a variety of strings as a password (for example, many sites allow the password to be any string that is formatted as a valid email address). This allows anyone to connect to such a site to copy information back to their own host. Anonymous FTP indexing tools ---------------------------- The use of anonymous FTP began as a convention among relatively few sites on the Internet and the names of sites supporting this mechanism was shared among users of the net through ad hoc methods. With the continued growth of the Internet [3] such methods are now seen to be inadequate. Recently, a number of information indexing and distribution services have been started to aid users in their search for information. It is expected that as the amount of information on the network grows, such services or information resource tools will become increasingly more important. The Internet Anonymous FTP Archive Working Group (IAFA-WG) has been formed under the auspices of the Internet Engineering Task Force to foster better utilization of the anonymous FTP archive mechanism for sharing information on the Internet. Anonymous FTP archives (AFA) are not the ideal method for publishing information on the Internet, however they do have the advantage of being relatively cheap and easy to establish and provide near universal access to their contents. With proper attention by archive site administrators, they provide a relatively simple way to distribute information. Organization of this document ----------------------------- This document is divided into 2 parts. Part I discusses the reasons why an organization might wish to establish an anonymous FTP archive site. Specific issues, both technical and non-technical are addressed to help you, the site administrator determine if establishing such an archive is appropriate for your site. Part II describes the steps needed to set up and maintain an anonymous FTP archive site. Specific examples for the most common operating system environments are included. Part I: What is an anonymous FTP archive ? Internet archives are repositories of information of common interest to a group. For example, researchers sharing a common set of data will often put the information in a central location so that it can be accessed by all those in the group. How this access is performed can vary, but on the Internet, the FTP service and the associated remote file sharing paradigm are often used. Why set up an anonymous FTP archive ? Site administrators set up anonymous FTP archives for any of several reasons: a) Sharing of useful information. Many sites contain data which their owners would like to make publicly available. Research papers, locally produced software and datasets are some of the most common offerings. An anonymous FTP archive allows you to make this information available to a large audience that would not otherwise be able to easily access it. b) Caching and redundancy. Sites at the end of a slow network link often set up such an archive to redistribute information obtained from other sites so that the operation need not be repeated multiple times for the same piece of information. Large software offerings such as X11 or TeX (which can total several hundred Megabytes) are often prime candidates for caching at the closer end of a slow network link. c) A site's profile can be greatly enhanced by providing a valuable network resource. A useful, large and well maintained archive site is such a resource to the general Internet community. This can give the group providing the archive higher visibility which in turn can call attention to other work at that site. d) A site with a large internal population of machines that are not themselves directly connected to the Internet (typically making use of a secure gateway) will often cache packages of interest to their internal population on a machine that is visible both internally as well as the rest of the Internet. This can often ease the fears of management about perceived security problems through unrestricted Internet connectivity while providing a useful service to the Internet as a whole. Initially, the majority of FTP archives resided on centrally controlled mainframes or minicomputers. The huge growth in the number of workstations and PCs on the Internet has led to the growth of a number of smaller, more site-specific archives. The current population of archives now offer everything from small collections of specialized data to offerings consisting of hundreds or even thousands of Megabytes of information, much of it shadowed or copied from other sites on the Internet. One must bear in mind that there are certain responsibilities that go along with the operation of an AFA. These include making sure that the resources are being used in a secure, ethical and legal manner. In addition, the system administration must allocate sufficient manpower resources to insure that these responsibilities are met. Part II: Setting up and maintaining an anonymous FTP archive site. Once it has been decided that an anonymous FTP account is to be created it is up to you, the system administrator to configure the FTP server to allow such access. Exactly how this is done is operating system dependent and may be as simple as creating a password entry with the appropriate information for an FTP pseudo-user account. In most systems today, support for the anonymous FTP account is built into the FTP server program (primarily to enforce security). It is important to bear in mind that once the account is enabled, by its very definition, _anyone_ on the network can access this account through FTP. You should note that the default anonymous FTP configuration (and corresponding documentation) supplied by your vendor may not always be secure or correct. NOTE: The practice of using the anonymous FTP mechanism as an "easy" way to distribute private files between parties is STRONGLY discouraged. A number of anonymous FTP archive sites continue to use the mechanism to exchange non-public information, functioning without password protection. This constitutes a attempted form of "security through obscurity" that is risky at best and potentially very damaging. Those sites wishing to exchange information via the FTP mechanism are urged to create special password protected ftp-only accounts for this purpose. Although requiring slightly more administrative overhead, it provides a much more secure environment. We give examples below on how to set up such an account for some common operating systems. UNIX ---- In most implementations of UNIX, the FTP server, ftpd(8), is launched from inetd(8. The FTP server initially runs as root, changing to the UID of the specified user once the authentication step is complete. The anonymous facility is enabled by adding an account for the user "FTP" to the password file. A typical /etc/passwd entry would look like this: FTP:*:67:20:Anonymous FTP account:/home/FTP:/bin/false Note that a) the password entry for the account contains an asterisk ("*") and b) the shell is listed as /bin/false. This combination will prevent remote login access to the account through telnet(1) or by the BSD remote commands rlogin(1), rsh(1), rcp(1) etc. Most UNIX systems will have the FTP server perform a chroot(2) upon startup. This limits file access for the process to the directory subtree specified by the anonymous FTP home directory (in this example /home/FTP, specified as part of the passwd entry above). For security reasons, the ftp home directory and _all_ its subdirectories (eg. ~ftp/bin and ~ftp/etc) should be owned by an account other than FTP, preferably root. Each of these directories require read and execute permissions, but should limit write permission only for the owner of the directory (ie. chmod(1) them to 755). a) ~ftp/bin should contain a copy of the ls(1) program. It should be owned by root and the directory should have only execute permission set (ie. chmod(1) 0111). b) The directory ~ftp/etc should be created with owner root and file permissions set read only (0444). It may _optionally_ contain a passwd and group file (as specified in passwd(5) and group(5)). For security reasons these files should NOT be copied from /etc on the system in question. These files will only be used by ftpd to show the file owner and group of any files held in the anonymous FTP area. Any entries in the ~ftp/etc/passwd and ~ftp/etc/group files should have an asterisk ("*") in the password field. The home directory and login shell entries in the ~ftp/etc/passwd file should be omitted. For example the ~ftp/etc/passwd file could contain an entry of the form (note that the UID and GID in this entry are NOT the same as the example above): FTP-user:*:31:29:Anonymous FTP Account:: The ~ftp/etc/group file could contain: FTP-group:*:29: IMPORTANT --------- ~ftp/etc/passwd and ~ftp/etc/group files are optional and are only provided for the convenience of the anonymous FTP user. They show the apparent ownership and group of a file or directory in the anonymous FTP subtree when an ls(1) command is issued from within the FTP session. The true ownership and group of a file or directory is given by the /etc/passwd and /etc/group files. It is prudent to change the name, ownerid and groupid so that they are not the same as in the /etc/passwd and /etc/group entries. The contents of the system group and password files show not be made available to remote users at any time since it provides additional information which may be useful when attempting to compromise the security of a site. In systems with dynamic libraries (for example, systems running SunOS 4.X), a copy of these libraries and certain devices may also need to be added to the FTP subtree. Consult your documentation to determine if this is true in your case. Technical and Administrative Notes ---------------------------------- There are a few areas where potential problems concerning either security or administrative might arise when running an anonymous FTP archive site. We will try to address some of these here. If you are not sure of the capabilities of the server software on your particular machine it is a good idea to consult your system documentation or your software vendor. Many of these problems can be solved by using one of the freely distributable FTP servers now available from various anonymous FTP archives. Technical Notes: ----------------- a) When ever you are running an FTP archive (whether it is an anonymous account or not) it is a good idea to use an FTP server that provides logging capabilities. This will allow you to keep track of the various operations that users are performing on your system. Of course, this implies that time to review the logs should also be allocated as part of the day-to-day operation of the system. It should be noted that this logged information usually contains the names or IP addresses of the hosts from which the client is logged on. In the past when there were many users on a system, this information didn't reveal much about who was doing what. However, in today's network environment where many individual computers have in fact become _personal_ computers, this information can easily identify the actual user to a high degree of probability. It is considered inappropriate behavior to release this logged information to individuals or groups not directly associated with the maintenance of the archive. Privacy rights have in many respects not been legally defined for computer environments. Thus it is up to each site administrator to see that privileged information is not consciously or inadvertently distributed. b) The view of the file system that the FTP client has access to should be restricted, with only those files specifically intended to being distributed actually visible. In the ideal case, this restriction should be enforced at the lowest possible level, preferably by the operating system itself. Application-level enforcement should be avoided. For example, some FTP servers try to restrict the movement of the clients by filtering pathname requests. This is a weaker enforcement of access policies than those supplied by the operating system and alternate servers which utilize OS support should be used where available. c) Many sites maintain "incoming" directories which allow the uploading of information into the archive by the general public. These can be very useful for the easy distribution of data, however they can also be used as a transfer point for files that should not be on your system. Most operating systems allow the creation of a directories that are world writable but not world readable. If you really want to have an incoming directory it is a good idea to configure it in this way to allow the site administrator to examine and approve submissions before they are moved to their final location or made generally available. Even with write-only incoming directories problems can occur. Collaboration between remote users can mean that filenames have been agreed to beforehand and are thus accessible to the parties involved. It is suggested that quotas be imposed on the FTP user partition so that there be less likelyhood of any uploaded information causing the partition to become full. NOTE: We recommend that such incoming directories only be used in situations where they are absolutely necessary and where the benefits of such a directory outweigh the potential problems outlined above. d) You should periodically check the permissions and ownerships of the files in your archive. Many administrators have adopted the practice of transferring ownership of archive files to the FTP pseudo-user. The file permissions should then be scrutinized to make sure that individual files cannot now be modified by that user (unless of course, that is specifically the intention). Remember, the "FTP user" is anyone using the anonymous account. The replacement of files with corrupted versions (viruses, trojan horses etc) has been known to occur. Ideally, no files in the subtree rooted at the ftp home directory should be owned by the FTP user (as defined in the /etc/passwd file). e) The anonymous FTP subtree of the file system should always be self contained. This means that references (for example, symbolic links) outside of this subtree cannot be resolved and are inaccessible to users of the system. In the specific case of UNIX systems, should file or directories should not be hard linked to any part of the file system other than the ftp user subtree. f) Care should be taken when naming or renaming files in archives. The truism that names should be meaningful takes on a greater significance in this environment since this is often all that the remote user has to work with when trying to discover the contents of the file without actually retrieving it. If one is caching a file from another FTP site, renaming is usually not recommended since the ability to determine if the two files contain identical information can be lost. Some operating systems allow the use of whitespace and non printable characters in filenames but their use is strongly discouraged since this can make the file inaccessible to the remote user. Additionally, characters such as '@', '!', '|', or "_" may not be available or may have special significance on remote systems and should be used with caution. g) Very large files should be split into smaller pieces when placed in FTP archives. The retrieval of large files can be difficult on unreliable or congested links since if a failure during transfer occurs, it is usually not possible to restart from the point of failure and continue. The entire transfer has to be restarted. This can be time consuming and eat up network bandwidth. Currently, files of 500 - 600 kilobytes are usually considered as the maximum desirable size. Files larger than this should be split. h) As the site administrator you might want to consider creating a CNAME record in the Domain Name System for your Anonymous FTP Archive. This record usually takes the form "FTP.<your domain>". This allows you to move the archive from one physical host to another without requiring you to notify all users of the move. For example, the machine "quiche.cs.mcgill.ca" would have a CNAME record which gave it the alternate name "ftp.cs.mcgill.ca". Thus if the archive for the domain cs.mcgill.ca is moved to another host, only the CNAME record would need to change. This change would in most cases be completely transparent to your users. i) You should ensure that you are running an up-to-date version of an FTP server, one which is not vulnerable to any known security problems. In particular, older operating system versions may have a vulnerable ftpd. If you identify your server as vulnerable, you should immediately discontinue service until you can install a fixed version. Administrative Notes: ---------------------- a) Check the contents of your archive regularly to make sure that the files stored there can legally (and ethically) be accessed by the general public. Only information that is freely distributable or in the public domain should be placed in an anonymous FTP archive site. Information of unknown status should not be made generally available until its status is resolved. Note: You should not assume that because the information was retrieved from another anonymous FTP archive that it is supposed to be generally available. In the absence of a notice in the material specifying that it is freely redistributable or in the public domain DO NOT place the files into your archive site: contact the original source of the information (if known) and verify that it can be redistributed in a public forum. If possible request that the original material be amended to this effect and retrieve the new copy. Note that under current law in most countries, copyright exits inherently with with author of the material, REGARDLESS of any lack of express notification of this copyright in the material. There have been many instances in the past of proprietary and copyrighted material being unknowingly distributed by uninformed archive administrators. This could prove to be an expensive mistake. Know what is in your archive. b) It is wise to only obtain files for caching on your system from "reputable" sites around the net that are well known and are run in a professional manner. c) Anonymous FTP site administrators should be aware that the storage of pornographic material in their archives may cause problems of a legal or (more likely) political nature. This is also true of other potentially offensive material such as that related to explosives, terrorism, etc. There have been a number of cases where the network provider for sites carrying such material has threatened termination of network access until the offending files have been removed. NOTE: since the law often varies widely between legal and regulatory jurisdictions on the definition of such things as "obscenity", the site administrator are strongly encouraged to check the laws and regulations governing their particular environments. Additionally, please note that these are ONLY guidelines and do not attempt to provide any legal advice. The IETF and its associates working groups may not be held liable for the breach of any legal code by anonymous FTP site administrators. Naming, Packaging and Delivery ------------------------------ There are several conventions which are widely used on the Internet to allow the efficient storage and transmission of information stored at Anonymous FTP sites. Site administrators should be aware of these so as to aid in the use of their own AFA as well as for the benefit of their users. Information stored on AFA sites is often "transformed" for two main reasons. To (a) "compress" (reduce the size of) the stored information making more space available on the AFA as well as reducing the actual amount of data transferred across the network; and (b) to "bundle" several files into one larger file to maintain the internal directory structure of the components as well as to provide the user with the ability to transfer only one larger object as opposed to several (sometimes hundreds) of smaller files; (c) binary data conversion for transmission. Traditionally Internet electronic mail (RFC 822) and USENET (*) protocols did not allow the transmission of "binary" (8-bit) data and therefore files in binary format had to be transformed into printable 7-bit ASCII before being transmitted in this manner. Some methods perform a combination of these operations in one easy step. On many systems various file naming conventions are used to enable the remote user to determine the format of the stored information without first having to retrieve the files. Below we list the more common compression, bundling and transformation conventions used on the Internet. This list is not intended to be exhaustive, however in all cases public domain or freely-available implementations of the programs associated with these mechanisms are currently available on the network. Depending on the implementation often one program will provide the functionality of both operations when processing data from and to its original state. 1) compress/uncompress Filenames terminating in ".Z" normally signify files which have been compressed by the standard UNIX* Lempel-Ziv "compress" utility. There is an equivalent program called "uncompress" to reverse the process and return the file to its natural state. No bundling mechanism is provided and the resulting files are always in binary format, regardless of the original format of the input data. 2) atob/btoa Performs a transformation of ASCII to binary (atob) and the reverse (btoa) in a standard format. Files so transformed often have filenames terminated with ".atob". No bundling or compression mechanisms are used. 3) atox/xtoa A data transformation standard used to convert binary files to transmittable ASCII format. Sometimes used in preference to other similar mechanisms because it is more space efficient, however it is not a compression mechanism per se, it is just more efficient in the transformation from one format to the other. Filenames of files in this format often have the ".atox" extension. 3) uuencode/uudecode Transforms ASCII to binary ("uuencode") and the reverse ("uudecode") transformation in a standard manner. Originally used in the UUCP ("Unix to Unix CoPy") mail/USENET system. No bundling or compression mechanisms are used. 4) tar/untar Originally a UNIX based utility for bundling (and unbundling) several files and directories into (and from) a single file (the acronym stands for "Tape ARchive"). Standard format provides no compression mechanism and for bundling the resulting file is always in binary format regardless of if the constituent files are binary or not. Naming conventions usually hold that the filename of a "tarfile" contain the sequence ".tar" or "-tar". 5) zip/unzip Often used in IBM PC environments, these complementary programs provide both bundling and compression mechanisms. The resulting files are always in binary format. Files resulting from the "zip" program are by convention terminated with the ".zip" filename extension. 6) arc/unarc Often used in IBM PC environments, these complementary programs provide both bundling and compression mechanisms. The resulting files are always in binary format. Files stored in this format often have a ".arc" filename extension. 7) binhex Often used in the Apple MacIntosh environment the binhex process provides bundling, compression and binary to ASCII data transformations. Files in this format by convention have a filename extension of ".hqx". In some cases, a series of the the above processes are performed to produce the final result as stored on the AFA. In such cases tradition holds that the original (base) filename be changed to reflect this and that the associated filename extension be added in the order in which the procedure was performed. For example, a common procedure is to first bundle the original files and directories using the "tar" process, followed by an application of "compress". Staring with a base name of "foobar", the result would be "foobar.tar.Z". As this is a binary file, transmission over the traditional email or USENET facilities would require a further transformation into printable ASCII by a program such as "uuencode". Publicizing your Anonymous FTP Archive -------------------------------------- Having set up an anonymous FTP archive site, how do you let other users know about it? This can be done in several ways, depending on your "target audience". Sometimes the information on the AFA is of use to a particular group such as exobiologists or baroque art historians in which case mail to the appropriate mailing lists or posting to the appropriate USENET newsgroups is a good place to start since it is seen by those people to whom the AFA is of most use. Since by definition however, an AFA is generally accessible to all users of the Internet, a posting to the USENET newsgroup "comp.archives" giving the name of the site, its contents and appropriate use policies (such as preferred contact times etc.) is recommended. It is expected that in the future more formal procedures for announcing the availability of an AFA will be created. Some Existing Information Systems for Anonymous FTP --------------------------------------------------- Once established, an anonymous FTP archive provides a repository for the data placed there by the group or institution administering the archive. Although currently the anonymous FTP mechanism is the most common way for distributing this kind of information on the network, it is not the only means for doing so. Several "Information Services" are now in use on the Internet which provide the administrator with alternative methods for making his or her files more widely available, through a variety of mechanisms. These are provided here as suggested enhancements to the operation of the archive. [* Short descriptions of the various information systems *] Conclusion ---------- A well organized and maintained anonymous FTP archive can be a valuable asset to any organization and to the Internet as a whole. With proper attention to the security of the system, you can provide a safe environment to help distribute the work of your own group or of any several million users on the network with minimal effort. References ---------- [1] RFC 959 Postel, J.B.; Reynolds, J.K. File Transfer Protocol. 1985 October; 69 p. (Obsoletes RFC 765 [IEN 149]) [2] NSFnet stats available from nis.nsf.net via anonymous FTP in the directory nsfnet/statistics/1992. [3] RFC 1296 Lottor, M. Internet Growth (1981-1991). 1992 January;
- Document I Alan Emtage
- Re: Document I Brad Clements