Revised WWW URL and File Naming Scheme

SLAC 15 Dec 1995


This paper was endorsed by the SLAC WWW Technical Committee, May, 1995.

Table of Contents


Introduction

SLAC recently installed a new server dedicated to WWW. To take advantage of its greater compute power, we also embarked on a project to move as many production WWW pages* and related files as possible from VM and temporary UNIX space (e.g., /winters/...) into a unified AFS file system on UNIX. We are using the occasion to regularize the file and URL naming schemes, too, which have scattered over time in several different directions.

Importance of the Revised Naming

In addition to improving WWW performance, moving page service to UNIX will improve reliability. The VM server is also no longer supported by its authors and is sometimes unstable, much more so than the WWW version. The page migration also supports SLAC's stated direction of moving off VM. Then, too, the "/FIND" in our VM URL is non-standard. It is a holdover from the fact that our VM server was originally based on the one being used at CERN for its FIND interface. Moving to UNIX gives us an easy opportunity to make SLAC's URL syntax conform to the world's.

During the page migration, we have the opportunity to improve long-term maintainability of the files and make them easier for authors (and users) to manipulate than currently on VM. By collecting the files together in a single namespace, long-term system maintenance will be easier than presently (although it may be a bit harder for page maintainers to install files outside their own space, especially at startup when they may have to learn various AFS administrative skills.) The WWW access rules can be simpler, and the URL can reflect the file names in a clear-cut way. Using mnemonically named files where hierarchical and longer names may be chosen than in VM should also make it easier for page authors to find and update their files, and even for page users to remember the URL when they need to reference it explicitly. Following the naming guidelines below is intended to promote a degree of consistency so people, especially page maintainers, can infer, to some degree, where to find files. This consistency should foster a comfortable "look and feel" to using the namespace, both actively and passively.

Having a well-structured namespace is expected to lead to even more benefits in the future. For example, we expect it will help in any migration we undertake to the successor(s) to the URL addressing scheme that will be, developers expect, more robust over time and operating system changes than current URL.

Particulars

In the new naming scheme, production URL and filenames being served from www1 are identical except that the filename has an additional prefix of:

/afs/slac.stanford.edu/www

which can be abbreviated:

/afs/slac/www

So for the "SLAC Experiment E144 Home Page," the fully qualified URL is:

http://www.slac.stanford.edu/exp/e144/e144.html

and the file is:

/afs/slac/www/exp/e144/e144.html

Production pages and files are generally those linked to from the SLAC Home Page or those hanging off that. Production pages belong in production file space, /afs/slac/www/*, for easier maintenance of the rules file, space, performance, and the information architecture itself.

In the case of potentially large collections, there may be a link from the production /afs/slac/www space into a group repository. E.g.:

/afs/slac/www/esh

is a symbolic link to ES&H's group space in:

/u/sa/eshdb/www/

but the URL for accessing the files looks the same as if they actually resided in production WWW space. For example, here is the URL for the ES&H Home Page:

http://www.slac.stanford.edu/esh/esh.html

The global WWW visibility of the files should still be clear because all the files reside in the /eshdb/www subdirectory.

Note that this practice of linking from production to group space should only be used for very large collections of files when putting them in the production Web page space is impractical. Symbolic links connecting globally visible Web space to space that is normally seen only by those logged onto a SLAC host may lead to unpleasant surprises. Generally speaking, it is better practice to link from more restricted (more secure) to less restricted (less secure) space.

For example, you should link from /usr/local/doc/forms/acctform.ps to /afs/slac/www/comp/form/acctform.ps. This way, the link clearly points out of a directory that can usually only be seen by people logged in to a SLAC host (or, soon also with SLAC-cell AFS privileges) and into space well-known to be displayed to anyone on the Web.

The Test SLAC Home Page and those hanging only off that are another exception to being located in the production WWW space. They reside in a separate test space in UNIX to be determined.

Personal home pages are a third exception and reside in the person's ~$USERNAME/public_html/$FILENAME sub-directory, where $USERNAME is the user's UNIX username, e.g., cottrell, and $FILENAME is the name of the user's home page, usually $USERNAME, home.html, or index.html.

Generally speaking, there should not be links from production space to files in a person's home directory. That space is for testing and sharing informally.

Guidelines

Many files other than the examples above need to be placed into SLAC's WWW file-naming structure. This paper draws on the experience gained working with diverse groups over the past couple of years to articulate guidelines for naming files and, hence, their URL. The document deals only with text and related files (e.g., Postscript and image)--those that are passively displayed. It does not treat naming CGI files for which there is less experience.

Organization

First are guidelines for organizing the production WWW space:
  1. Divide files into two basic categories, "functional" ones and "organizational" ones. Functional files treat subjects of interest to a set of SLAC users. Organizational files describe some part of the SLAC organizational structure. Note that organizational files often change faster in more basic ways than functional files, which may be more likely to require small, technical or procedural updates. There is overlap between the two categories.
  2. Reflect the functional/organizational distinction at the top of the SLAC WWW file space (as well as lower down). See "Naming Proposal Examples" below. Documents published for one or more major audiences outside a group itself usually belong in "functional" space. Documents targeted for the group itself usually belong in "organizational" space, along with the group or departmental home page. Pages in functional space are usually relatively formal. Those in group space may be more casually put together. Having documents in "functional" space is an honor; but because of its broader audience, there is a greater responsibility to have them reflect well on SLAC by being relatively polished and conformant to SLAC guidelines. (See the WWW Style Committee Report.) Normally small groups will start out in group space due to resource limitations.
  3. Files in group space are named /grp/$CODE, where $CODE is usually the two or three character BINLIST code. There are currently seventy-four of these in use. (See "Appendix A.")

    From other contexts these codes are often already recognized by SLAC people, e.g., cd, pur, or scs. The BINLIST codes frequently focus on operational components of the SLAC organizational structure that are less likely to change than the hierarchical levels above them. In any case, keeping the hierarchy flatter means there are fewer components subject to change than if the name reflected all levels of today's organization chart.

    There may be a few exceptions to using $CODE. For example, the BINLIST code for the SLAC Library is lib and for TechPubs is pub but these have other contexts in UNIX (e.g., /usr/local/lib and /usr/local/pub). Also, the group may be particularly identified with its name more than its code, e.g., the Library.

  4. Access to some files should be restricted to SLAC users. The best we can do now fairly securely is to restrict WWW usage to those logged in to a host with a SLAC IP number (134.79...) when the fully qualified URL (and related filename) includes "slaconly", e.g., /pubs/slaconly/tip. Note, however, that users with appropriate AFS privileges may read any file in /afs/slac/www space including those with "slaconly" in their names.

    Remember that by default all files in SLAC WWW space are visible to anyone using the Web around the world.

Names

Following are some criteria for naming the sub-directories and files:
  1. Particularly "leftwards" in the filename (towards the top of the file hierarchy, keep names short but mnemonic. Because of name-length limitations in AFS volume names that are particularly important in re-establishing the file system after certain crashes, short subdirectory names or clear abbreviations of those names are important. Short names are also faster to type and consistent with the UNIX style of labeling things. On the other hand, very long names may cause browser displays to break.
  2. Once names for types of things have been chosen, use them consistently. Name the same kind of subdirectory the same wherever it appears, e.g., /pubs/figure, /exp/sld/figure, /slac/www/resource/figure, and /grp/scs/net/figure for figures that are displayed using Tony Johnson's CGI script. Being consistent helps people recognize directories and files when they encounter them or even unearth them via a find command.+
  3. Make the "master" file in a subdirectory the same name as the subdirectory above it, e.g., /slac/www/resource/resource.html. The "master" file may or may not be a home page. It is the file you want someone trudging down the file hierarchy to look at first for an understanding of what the subdirectory's all about. Sometimes a subdirectory may not (yet) have a "master" file, e.g., /slac/www/wwwtech/wwwstyle. Sometimes a home page, e.g., home.html, may be more appropriate.

    In any case, index.html is not appropriate since it is used by various servers to show those files in the subdirectory that may be displayed to the Web and suppress the automatic index (specified, if enabled, by a terminal / on the URL for other than the server default home page).

Naming Styles

Finally are some recommendations for naming styles:
  1. Restrict subdirectory names to lower case. Path names may be used in programming (e.g., see the environment variable, $PATH). Besides, it's not always possible to anticipate "correct" file name capitalization, even if you were to memorize The Chicago Manual of Style because of the limits on UNIX names. Using a consistent naming strategy saves time and frustration because it often results in reduced "finger jitter" when you try to anticipate the capitalization as you enter the names and, in the end, a total of fewer commands entered to access the files.
  2. Use the generic singular form rather than the plural in pathnames and generally filenames, unless a name is the name of something well-known in the plural, e.g., /pubs or /stats. Again, people get finger jitter trying to anticipate whether the filename is singular or plural. Consistency is a help here unless the name is typed before you think of its number.
  3. Mash terms together without hyphens ("-") unless the result is misleading to parse, e.g., use /slac/hottopic, not /slac/hot-topic; but, /emp/emp-opp, not /emp/empopp. Again, people get finger jitter trying to decide when to include or not include a hyphen. As long as it's clear, typing fewer characters rather than more is faster. Also, shorter names can provide more context, e.g., in the NeXT filename browser.
  4. Use periods in filenames to indicate filetypes, e.g., .html, .ps, .ps.Z, .pdf, and .gif. Otherwise avoid them.
  5. Have the file owner choose the name of the file to taste, e.g., with free use of capitalization, hyphens, and longer names. There are advantages to these in readability, and the filenames themselves are much more like a form of document title than the path names. Making a distinction between pathname and filename conventions is fairly easy to do.
  6. If you're using a formatted-file-to-HTML converter like rtftohtml, don't fight what it does. If you're creating HTML files yourself, there's a tradeoff between one larger file (e.g., more time to transfer over the net, increased skills needed to use) and several smaller ones (e.g., more name space used, more URL and files to keep track of, and increased difficulty in searching).

Naming Examples

The following recommendations puts these guidelines to work in the SLAC WWW AFS space.

Some key pages:

     slac.html                          the default SLAC Home Page
     /slac/disclaimer.html              the SLAC disclaimers
     /slac/slacinst/institution.html    the SLAC Institutional Page
Some first-level subdirectories (functional first):

     /accel                  for accelerator
     /archive@                "  archives
     /bis                     "  business information systems
     /comp                    "  computing
     /emp                     "  employment
     /esh                     "  environment, safety, and health
     /exp                     "  experiment (multi-institutional)
     /gen                     "  general information
     /library                 "  library
     /phys                    "  physics
     /pubs                    "  SLAC publications, images,...
     /slac                    "  SLACwide information
     /spires                  "  specifically SPIRES applications
     ...                      "  more functional categories
 
     /org                     "  organization-specific information
 
     /grp                     "  group- or department-oriented information

Some second-level subdirectories (functional first):

     /accel/pepii            (multi-institutional)
 
     /archive/1994
     /archive/1995
 
     /bis/acct
     /bis/budget
     /bis/commits
     /bis/pers               for personnel systems
     /bis/procure
     /bis/snap
     /bis/stores
 
     /comp/future
     /comp/intro
     /comp/mac
     /comp/net
     /comp/pc
     /comp/phys
     /comp/security
     /comp/telecom
     /comp/unix
     /comp/vendor
 
     /emp/emp-opp            for employment opportunities
 
     /esh/bull
     /esh/slaconly
 
     /exp/e143
     /exp/e144
     /exp/e154
     /exp/mq
     /exp/sld
 
     /gen/area               for Local Area Resources
     /gen/edu
     /gen/map
     /gen/meeting
     /gen/tour
     /gen/visit
 
     /library/libnews.html
 
     /pubs/beamline
     /pubs/figure
     /pubs/slaconly
 
     /slac/hottopic
     /slac/slacinst
     /slac/www
 
     /spires/doc
     /spires/tool
     /spires/form
 
     /org/chart
 
     /grp/bbr                (SLAC group)
     /grp/cd
     /grp/efd
     /grp/mfd
     /grp/pep                (SLAC group)
     /grp/per
     /grp/scs
     /grp/thp
A few lower-level subdirectories (functional first):

     /bis/procure/req/slaconly
 
     /comp/intro/scsc-serv
     /comp/telecom/phone-dir
     /comp/telecom/phone-users-guide
 
     /exp/sld/figure/top20
 
     /gen/meeting/ssi
 
     /pubs/figure/top20
 
     /slac/www/gen
     /slac/www/resource
     /slac/www/resource/icon
     /slac/www/stats
     /slac/www/swug
     /slac/www/tool
     /slac/www/tool/search
     /slac/www/wwwpolicy
     /slac/www/wwwstyle
     /slac/www/wwwtech
     /slac/www/wwwtech/doc
     /slac/www/wwwtech/doc/notes
 
     /grp/scs/net
     /grp/scs/scsc
     /grp/scs/systems
A few examples of conventional file names:
     /accel/pepii/home.html

     /slac/www/resource/resource.html

     /grp/scs/mission.html
     /grp/scs/orgchart.ps
Some exceptions:
     /grp/techpubs
     /grp/library

Some Implementation Details

Contact SCS for intial creation of your directory in AFS WWW space. You will be expected to know basic AFS file management commands. SCS will establish appropriate Access Control Lists (ACL) and groups. Groups will generally come in pairs where one group controls who's in the other that actually has the write privileges into the AFS directories for your pages.

For more information on installing pages, see "

Recommendations in Closing

It is recommended that subdirectory names at the first two WWW levels be designed with a "WWW AFS Registrar" designated by the WWW Policy Group, who works to keep the high level taxonomy sensible and consistent in light of specific user needs and system requirements. Groups may find designating their own group registrars (or advisors) useful.

In the short run Joan Winters has agreed to serve as the registrar with a backup registrar in the works and Pat Kreitz as the "higher authority." In the normal course of events, one working-day naming turnaround is the goal.

A place or places for source, .../src/..., should be provided in this WWW AFS name space for files that are not self-defining, e.g., for .ps or .pdf. In some cases a pointer file to where the source is kept may suffice, but this may well prove to be less stable over time.

It is also recommend that SLAC develop tools to ease migration of pages through the system, including providing for file/URL renaming over the years. Cleaning out the obsolete files (and sometimes putting them into /archive/$YYYY, where $YYYY is the year of last update) will keep the WWW information space easier to read and use by maintainers and then by users.

Acknowledgements

This work is an outgrowth of an effort started by Tony Johnson. See "New URL Scheme for SLAC WWW Server". He has continued to be very helpful in discussions along the way. In addition, real world examples provided by Ilse Vinson, Jay Venti, Brooks Collins, PA Moore, Laurie Gennari, Diana Gregory, George Crane, and others have helped significantly to flesh out the model. Feedback from the WWW Technical Committee has been very useful in developing some of the concepts. Any inadequacies in the document are, of course, my own.
*
Hereinafter the word "page" may stand for "related files" as well.
+
Note that CGI scripts may lay constraints on filenames along with the goal of making them recognizable to people. E.g., .../figure/x is a general form used by TonyJ's CGI script to summarize figures for display, where x = top20, local, cern, or other appropriate name.
@
The string selected is the longer "/archive" rather than the shorter "/arch" because /arch brought "architecture" to several minds. "ARCH" and "archname" are already used in SLAC's UNIX system to indicate platform architecture.

The purpose of the /archive hierarchy is for important files that no longer have a current use, e.g., pages for 1995's "Take Our Daughters to Work Day", which is past, or "The SLAC WWWizards" page, which SLAC no longer links but has historical importance. This hierarchy is like a records storage facility for important documents that you no longer keep in your office. It is not intended to be comprehensive.

Another category of old documents is those still having some clear amount of use, e.g., an archive of old "list" email. Consider creating a .../archive sub-directory below the topics's major directory for these items.


Appendix A

On April 7, 1995, Diana Gregory supplied the following list of group codes valid in the SPIRES BINLIST subfile:

 AAO Affirmative Action Office
 ACC Accounting Office
 AD  Accelerator Department
 BAS Business Applications Support Group
 BBR BABAR
 BSD Business Services Division
 BU  Budget Office
 CAF Cafeteria
 CB  Crystal Ball Project
 CD  Controls Department
 CG  Computation Research Group
 CMS Central Lab Machine Shop
 CYE Cryogenics Engineering
 CYO Cryogenics Operations
 DO  Director's Office
 DOE U.S. Department of Energy
 EA  Experimental Group A
 EB  Experimental Group B
 EC  Experimental Group C
 ED  Experimental Group D
 EE  Experimental Group E
 EFD Experimental Facilities Department
 EG  Experimental Group G
 EH  Experimental Group H
 EI  Experimental Group I
 EK  Experimental Group K
 ESA End Station A Users
 ESH Environment Safety and Health Administration
 EWM Environmental Protection and Waste Management(ESH)
 FAC Facilities Office
 FD  Palo Alto Fire Department Station 7
 IBM International Business Machines
 IRM Information Resource Management and Technology Transfer
 IS  Information Services
 KLY Klystron and Microwave
 LHT Liquid Hydrogen Targets
 LIB Library
 LTR Low Temperature Materials Research
 M-2 Mark II Experiment
 MD  Mechanical Design
 ME  Mechanical Engineering/Alignment
 MED Medical Department
 MET Metrology
 MFD Mechanical Fabrications Department
 MU  Mechanical Utilities
 NPS Nuclear Physics at SLAC
 OHP Operational Health Physics (ESH)
 PAD ESH Planning and Assessment Department
 PAO Public Affairs Office
 PCD Power Conversion Department
 PE  Plant Engineering
 PEL Physical Electronics
 PEP Positron Electron Project
 PER Personnel Department
 PMS Plant Maintenance Services
 PRC Property Control
 PUB Publications
 PUR Purchasing
 RD  Research Division
 RPG Radiation Physics (ESH)
 SCS SLAC Computing Services
 SEC Security
 SHA Safety, Health and Assurance Department
 SLC SLAC Linear Collider Project
 SLD SLAC Large Detector
 SSP Summer Science Program
 SSR Stanford Synchrotron Radiation Lab
 TD  Technical Division
 THP Theoretical Physics
 TPC Time Projection Chamber
 TR  Travel
 TSP Accelerator Theory and Special Projects
 TT  Tiger Team
 VAC Vacuum Group

Joan M. Winters