Bibliographic management in digital projects

When creating and maintaining a bibliography for a digital project, there are four very important principles to keep in mind while choosing a bibliographic management software.  Paying attention to these principles from the beginning will help your project's bibliography to remain internally consistent, to inter-operate with the other data comprising the project, and to be maximally useful for search, browsing, display, and export.

What are some of the options?
The Stanford University Libraries support and can assist with several different bibliographic management programs through subscription, instruction, and technical support.  These are EndNote, Mendeley, RefWorks and Zotero.  If you are planning to use a different tool, be sure support is available from the developer. Using one of the tools supported by the Stanford University Libraries goes a long way toward abiding by the following principles.  But the principles themselves are important to understand and to keep in mind.

Principle 1: Data structure is essential
Bibliographic data is highly structured: titles, authors, dates, pages, places of publication, etc., are all distinct structures of standard bibliographic citations.  In order to do something with a citation, its information must be maintained in a structured format (as opposed to, say, unstructured text): each element of a bibliography should be clearly identified in some standard, persistent way so that an automated program can act on it -- whether to display it in a particular way (e.g., in italics if it's a book title), to sort it (e.g., by publication year), to display or hide it (e.g., an annotation or an internal note), etc. Library catalogs and the bibliographic management programs mentioned above do all of this internally.

It is also possible to structure data in a simple spreadsheet, in which structure comes in the form of rows and columns.  (This not a great choice, but not terrible.)  The worst choice is to use something like a word processing program, which generally lacks options for structured data altogether.

Principle 2: Mind how the data gets in
Whether your citations are entered by hand or automatically downloaded from some catalog or database, all of the tools cited above can help. Consider whether a single person or multiple people will be gathering data and maintaining the collection of citations.  Using stand-alone tools (i.e., those that store data on an individual's computer) for a jointly-curated bibliography will likely lead to duplication, or gaps, or both.  Merging bibliographies (which requires de-duping and possibly re-formatting) is generally a nightmare!

If the principal source of your project's citations is one or two known databases, then it's worth taking the time to experiment with the tools available to see which ones work best with those particular sources: which are simplest to use, which capture and display the citations in the most straightforward and "clean" way, etc.  Beware of tools that take structured data (for example, edition statements, multiple authors, etc.), and lump them into a single, unstructured field – for example, a generic note field.

Principle 3: Mind how the data gets out
At some point, your bibliography will need to be transformed into part of something larger like a digital collection or a database.  The individual citations, and each of their individual parts, will be transformed to some other format appropriate for display (e.g., HTML) or for searching, sorting and browsing.  Digital library workers can often do such transformations in reasonably straightforward ways, as long as the export format is also structured rather than plain text.  Experiment with a few sample records to see whether they travel the complete import – export – import cycle without losing any data or structure.

Principle 4: Standardization is the key to interoperability
Not only is it important to use standard bibliographic fields in a structured way. It is also extremely useful to include standard numbers or other identifiers when they are available.  ISBNs, ISSNs, DOIs (“Digital Object Identifiers”), and PURLs (“Persistent URLs”) can often be turned automatically into links – a simple, straightforward way to turn your bibliography into an interactive, interoperable part of the digital project.


Likewise, with subject headings, author names, uniform titles, etc., it is wise to rely on well-known standards like the Library of Congress authority headings that at least contain the possibility of being automatically linked to other catalogs or resources.  You may prefer the spelling “Tolstoï” on linguistic or esthetic grounds, but if the Library of Congress prefers “Tolstoy,” then your Russian novelist will have a hard time being properly identified. Emerging standards such as VIAF (the Virtual International Authority File) may make such cross-linking more automatic in the future.  The more your project is able to include identifiers from standard, authoritative sources, the better it will be able to integrate with the rest of the information world and other digital projects.