You are here

Indexing MARC records for SearchWorks - navigating Open Source Software

The (meta)data underneath SearchWorks is largely based on our MARC records from Symphony. MARC records are exported from Symphony, then slurped up by an application called SolrMarc, which transforms the MARC data into an index for the Solr search engine used by SearchWorks.

SolrMarc is open source software made available by Bob Haschart of the University of Virginia Libraries. SolrMarc is used by all(?) VuFind sites as well as most Blacklight sites built on MARC data (e.g. SearchWorks). SolrMarc has been great for us -- it gave us an enormous jump start for SearchWorks. Bob is also a great guy, and made me a "committer" almost immediately -- so I can make contributions to the open source code.

But.

Open Source Software does best when there is a critical mass of developers: group wisdom rocks, as does sharing the work. To date, SolrMarc is very much Bob's project, despite a number of committers such as myself. There are some ... interesting ... practices as to how SolrMarc is organized and how it is tested. I've even contributed a bit to some of its squirreliness. Occasionally, changes to the SolrMarc codebase break the code I've written especially for Stanford.

Bob does a great job of juggling VuFind needs, Blacklight needs, UVa needs, less savvy consumers' needs, and maintaining backward compatibility with earlier versions of Solr. With all that, it's not surprising SolrMarc can be a bit "slow to turn." Technology advances - we find new tools to improve developer productivity.

In the interests of reducing my ongoing work for Stanford's SearchWorks index, I have, with Bob's blessing, forked the SolrMarc code. A "fork" is when someone takes an existing codebase and decides to create a different "edition" of the software, which will be improved and maintained separately.

This is an experiment: I believe my personal efforts will be reduced by using this pared down derivative of SolrMarc. One goal of my fork is to simplify the code and the build scripts for development purposes. Some of my peers at other institutions have had similar frustrations, so I've made my fork publicly available. It adheres to best practices, such as automated tests running, automatic documentation builds, etc.

It would be awesome if this fork converges with SolrMarc future development to the point of re-combining the codebase. Meanwhile, as Bob and I have discussed, this fork may help Bob with some of his refactoring plans, and I can forge ahead with Stanford specific needs more easily.

There are more technical details available here.