Chapter 12: The iVia Databases

The iVia Master Database

The iVia record data is stored in the iVia master database. The master database is a MySQL database and can reside on the same machine as the iVia installation, or on a dedicated database server. Setting up the MySQL database so that the iVia user can connect is part of the installation process described in Chapter 4.

The master database is comprised of several tables, described in Table 12.1 below.

Table 12.1: Tables in the iVia database
record_info Metadata records in the live database (i.e. metadata that can be searched and viewed by users).
pending_new_records Metadata records that are not yet ready to be added to the live database, or which have been deleted and retained.
modification_history Details of the modifications made to records in the record_info and pending_new_records tables.
my_infomine Details of the meanings and permissions for the my_i descriptors, stored in the my_infomine field of record_info and pending_new_records databases.
adders_info Information about the iVia adders.
campuses_info Information about the institutions that share this iVia installation. Each adder must be affiliated with an institution.
statistics Historical record of statistics like the number of records in each category, and so on.
email Subscriber information for the iVia new resource alert service.
version The iVia database version (used to update the database without harming data when new versions of iVia are released).

The most important table is called record_info. Each row in record_info contains the metadata describing a single resource, which is described by over thirty fields including a unique identifier and URL; provenance and maintainer information; Title, Creator, Keywords, Description and other Dublin Core metadata; LCSH and LCC metadata; MyI, audience level, usage, and other iVia-specific metadata; and the full text and URL status of the resource. The pending records database contains the data currently being edited by the adders, and has the same structure as record_info with a few additional fields. Whenever an adder changes an iVia record, their changes are logged in the modification_history table.

The Inverted Index Database

Once there is data in the master database, an inverted index database is created to support the iVia search functions.

The iVia inverted index is a Berkeley DB database that associates every word (and many of the phrases) that appear in the collection with a list of the records, offsets, and fields in which they occur. The canned_search program uses these indexes to perform searches.

Warning: Berkely DB is sometimes unstable under very high loads. If you have trouble with these databases, consider disabling incremental inverted index updates.

The Inverted Index Files

The inverted index databases are stored in the inverted_indexes_dir directory identified in the main iVia.conf file. The inverted indexes will comprise a set of Berkeley DB database files (whose filenames end in ".db") and three other files used internally by Berkeley DB for file locking, shared memory management, and similar functions.

To regenerate the inverted index files, run the create_inverted_indexes.sh script. Usually this will be done as part of the regular nightly tasks.

The database files should always have the user set correctly for your installation, and be readable and writable. They are read by the every time a patron performs a canned search, and whenever you run the word_level_data_dumper command or one of the commands used by create_inverted_indexes.sh. Whenever the databases are accessed, the program also needs to be able to WRITE to the lock files. (Note: This causes problems if your apache installation has a special user like www-data and the lock files are deleted because they may be recreated by the www-data user without the necessary read and write permissions. Normally, you will need to fix this manually as root.)

Debugging the Inverted Index Database

If the files become corrupt (searches stop working) fix them as described in the chapter on Troubleshooting).

You can check what iVia data is in a database with the word_level_data_dumper program.

Berkeley DB comes with diagnostic utilities: db_stat and db_recover. (In Debian, these are renamed according to the version in use; for Berkley DB version 4.2 install the libdb4.2-util package where the programs renamed to db4.2_stat and db4.2_recover.)

iVia provides wrapper scripts around these functions: check_berkely_env.sh will quickly check the environment files (fast), and check_inverted_indexes.sh will examine the databases for corruption (slow). For added reliability, you may want to run check_berkely_env.sh as a cron job every five minutes.