The iVia files

iVia is split into three types of files: programs (binary files and scripts), themes (template HTML files), and data files (inverted indexes). Some of the programs are located in the IVIA_INSTALLED/bin directory and some of them are installed as CGI programs and run by apache. The themes are all installed in IVIA_INSTALLED/data/html_templates and are read by the CGI programs to produce dynamic output. The inverted indexes are located in IVIA_INSTALLED/data/inverted_indexes.

iVia command line programs

Table of iVia command line programs
MARC_importerImport MARC records into the iVia database.
OAI-PMH-importerImport records using the OAI-PMH importer
afcrawlerRun an automatic focused crawler process.
afcrawlerdRun an afcrawler demon process.
alert_service_email_dispatcherSend email to users.
browse?
cache_compactorCompact the page_cache.
check_berkeley_env.shCheck the berkeley databases.
check_inverted_indexes.shCheck the inverted index databases. (Calls check_berkeley_env.sh)
click_throughdDemon process that collects information about URLs clicked on through the search interface.
create_flat_record_level_indexCreate a record level index. (Run by nightly.sh)
create_hierarchical_record_level_indexCreate a hierarchal record index. (Run by nightly.sh)
create_inverted_indexes.shCreate the inverted indexes. (Run by nightly.sh)
create_ispell_dictionary.shCreate the ispell dictionary. (Run by nightly.sh)
create_lcc_outline_hierarchyCreate the lcc hierarchy. (Run by nightly.sh)
create_restricted_access_indexCreate restricted access index. (Run by nightly.sh)
create_word_level_indexCreate word level index. (Run by nightly.sh)
db_detect.shDetect the Berkeley version.
delete_blacklisted_recordsDelete blacklisted records in record_info. (Run by nightly.sh)
delete_duplicate_recordsDelete duplicate records in record_info. (Run by nightly.sh)
delete_non_english_recordsDelete non English records in record_info.
delete_stale_recordsDelete stale records in record_info.
evaluate_metadata_assignmentEvaluate the metadata assignment.
exec_cgiExecute a CGI program.
expert_guided_crawlerExpert guided crawler.
generate_browse_LCC_dataGenerate LCC browse hierarchy. (Run by nightly.sh)
generate_browse_list_dataGenerate browse list. (Run by nightly.sh)
generate_browse_tree_dataGenerate browse tree. (Run by nightly.sh)
generate_infomine_category_classifiersCreate INFOMINE category classifiers.
generate_page_digestsGenerate page digests.
generate_record_count.shGenerate record count.
generate_statisticsGenerate database statistics.
generate_statistics_htmlGenerate statistics pages.
generate_word_frequenciesGenerate word frequencies.
hourly.shHourly maintenance script. (Intended to be run via cron.)
identical.shTest records for duplicates.
index_recordIndex a record.
ivia-adduserAdd a user to the iVia system.
ivia-configQuery the iVia configuration file.
nightly.shNightly maintenance script.
page_cache_serverCurrently not used.
page_dump_importerImport a Nalanda page dump into an iVia database.
query_loggerLog iVia search queries.
read_ini_fileQuery an iVia INI file.
record_info_to_htmlCreate HTML files from a record_info database.
record_level_data_dumperRun a query on the record_info database.
site_checkerCheck all "sites" (URLs) in the record_info database. Typically runs via nightly.sh.
site_checker_singleCheck a "site" (URL).
train_lcc_outline_classifierGenerates a hierarchy of LCC classifier files from a training corpus.
update_assign_fieldsUpdate field assignments.
update_canonize_fieldsCanonizes fields in the record_info or pending_new_records tables.
vlcrawlerRun a virtual library crawler process.
vlcrawler_singleCrawls a single virtual library.
vlcrawlerdRun a vlcrawler demon process.
weekly.shWeekly maintenance script. (Intended to run via cron.)
word_level_mask_decoderDecodes an inverted index word level mask. (Intended for debugging inverted indices.)

iVia template files

The iVia template files are located in IVIA_INSTALLED/data/html_templates and they are organized into different themes. If you wish to create a new theme simply copy the supplied branch of a particular theme and then edit them to fit your desired theme. There are themes for public pages and for what we call dbase_mange which is the adder interface to iVia. The template can be set in the iVia.conf file located in the IVIA_INSTALLED/etc directory. The following is a hierarchal layout of the template files for three different uses in iVia. The first is under dbase_manage making it a theme for the adders interface. The second theme located in IVIA_INSTALLED/data/html_templates/iVia is a public theme for the public search engine area of iVia. The third is a micro theme designed to be referred to in the URL for including search results in other pages. (ex: http://infomine.ucr.edu/canned_search?theme=p_title_desc). In the list below, bullets that contain no extension are folders or sub-folders in the HTML_templates directory tree. The folders correspond to the CGI program that uses the files contained in the folder. If the folder is called canned_search then the program canned_search uses the template files located in that folder.

The Inverted Index Database

Once there is data in the master database, an inverted index database is created to support the iVia search functions.

The iVia inverted index is a Berkeley DB database that associates every word (and many of the phrases) that appear in the collection with a list of the records, offsets, and fields in which they occur. The canned_search program uses these indexes to perform searches.

Warning: Berkeley DB is sometimes unstable under very high loads. If you have trouble with these databases, consider disabling incremental inverted index updates.

The Inverted Index Files

The inverted index databases are stored in the inverted_indexes_dir directory identified in the main iVia.conf file. The inverted indexes will comprise a set of Berkeley DB database files (whose filenames end in ".db") and three other files used internally by Berkeley DB for file locking, shared memory management, and similar functions.

To regenerate the inverted index files, run the create_inverted_indexes.sh script. Usually this will be done as part of the regular nightly tasks.

The database files should always have the user set correctly for your installation, and be readable and writable. They are read by the every time a patron performs a canned search, and whenever you run the word_level_data_dumper command or one of the commands used by create_inverted_indexes.sh. Whenever the databases are accessed, the program also needs to be able to WRITE to the lock files. (Note: This causes problems if your apache installation has a special user like www-data and the lock files are deleted because they may be recreated by the www-data user without the necessary read and write permissions. Normally, you will need to fix this manually as root.)

Debugging the Inverted Index Database

If the files become corrupt (searches stop working) fix them as described in the chapter on Troubleshooting).

You can check what iVia data is in a database with the word_level_data_dumper program.

Berkeley DB comes with diagnostic utilities: db_stat and db_recover. (In Debian, these are renamed according to the version in use; for Berkeley DB version 4.2 install the libdb4.2-util package where the programs renamed to db4.2_stat and db4.2_recover.)

iVia provides wrapper scripts around these functions: check_berkeley_env.sh will quickly check the environment files (fast), and check_inverted_indexes.sh will examine the databases for corruption (slow). For added reliability, you may want to run check_berkeley_env.sh as a cron job every five minutes.