SARV: database for geocollections of Estonia

The text below is taken from a conference abstract presented in the 8th Baltic Stratigraphic Conference in Riga, August 2011. Thus it may not reflect the most recent developments of the system.

Collections can be effectively utilised only when they are appropriately catalogued and the information is readily accessible. Nowadays this is greatly facilitated by the use of electronic databases. In Estonia the first efforts in using electronic databases for collection management were made at TUG in 1994. A few years later the development of a custom database designed specifically for geological collections and related information started at GIT. This database, now known as SARV, has since then evolved from institutional desktop application into a relational client-server information system that is deployed in three institutions. SARV aims to serve the needs of collection managers as well as of researchers seeking for information or willing to store their data in a structural and easily searchable form.

The data model of SARV consists of more than a hundred related database tables, the most important of which are collection, specimen, sample, locality, drillcore, preparation, analysis, reference, agent, classification, stratigraphy, location and loan. The server-side software of SARV is based on open source components, such as the Ubuntu Linux operating system, MySQL database server, Apache web server, PHP and Python scripting and various other tools. On the client (user) side the data entry and regular collection management procedures such as accessioning, keeping track of the specimens, printing labels and loan invoices, etc. are still partly grounded on a custom Microsoft Access application. However, with the emergence of new standards and technologies (e.g. AJAX, HTML5), the recent focus has been on switching to an entirely browser-based solution. An experimental web application built on top of the Django framework already replicates most of MS Access functionality and shows a good potential for full replacement of the desktop software.

SARV has a publicly accessible web interface at http://geocollections.info, where users can search for information related to individual collection objects, fossil species, stratigraphical terms, image files, etc. The data on the website can be freely used for non-commercial purposes according to the Creative Commons license. SARV can also be accessed via BioCASe (http://www.biocase.org), GBIF (http://www.gbif.org) and the recently established GeoCASe (Geosciences Collection Access Service, http://www.geocase.eu) specimen-level data networks.

Architecture and main components of the geocollections database used in Estonia.

As of 2011, approximately a quarter of Estonian geological collections are electronically catalogued at the unit level. The majority of the important collections, type- and cited fossil specimens in particular, are already in the database. In addition to registration of physical collection objects, the system contains a growing amount of related information starting from digitised photo archives, scanned field notebooks, results of geochemical analyses, annotated taxonomic and stratigraphical dictionaries and so on. Most of that information is freely accessible online.

In summary, we have learned that the development of a functional database is possible with rather limited resources, except time. It requires, however, tight collaboration between collection managers, researchers and technical developers. Broader benefits of the database appear when a certain critical amount of data is available digitally and when the collection-oriented data are linked to different scientific information. The recently approved national research infrastructure roadmap and a new INTERREG project enable further professional development of SARV, foster data entry and increase the visibility and application of Estonian geological collections in the coming years.

Chronological highlights of database development

  • 1994 - First databasing attempts of geological collections at TUG
  • 1996 - Start of digital cataloguing of geological specimens, using Lotus 1-2-3 and MS Excel at GIT
  • 1998 - Initial version of multi-table database based on MS Access 97
  • 2000 - MS Access-based multi-user networked database
  • 2002 - Deployment of MySQL database server software in MS Windows environment
    First public website for accessing the collections database, included web map server
  • 2003 - Collaboration with Estonian Museum of Natural History (ELM), testing the same software and data model there
    Joining BioCASE specimen-level data network as the first in Estonia
  • 2004 - First dedicated server hardware, migration to Red Hat Linux operating system
    The name "SARV" was first used for the database
    More functional public web portal for data access
  • 2005 - Adjusting the database structure and building web interfac for ELM
    Migration to Debian Linux
  • 2006 - First attempts to deploy SARV in the Museum of Geology, University of Tartu
  • 2007 - Linking with Google Maps web map service, smart-card authentication for restricted web-based interface
  • 2008 - Common web portal for three institutional databases, updates of other web interfaces
    Specific modules for taxonomy and fossil species and photo archives
  • 2010 - Joining the international GeoCASe data network
  • 2011 - Upgrade of server hardware, prototype of web-based thin client to replace legacy MS Access application