- Table of Contents
- ::
- Introduction
- ::
- Overview
- ::
- Architecture
- ::
- Installation
- ::
- Importing
- ::
- Exporting
- ::
- Remote Services
- ::
- Databases
- ::
- Searching
- Appearance
- ::
- New Categories
- ::
- Troubleshooting
- ::
- Contact
Chapter 3: The iVia Architecture
Figure 1 gives a simplified overview of the iVia architecture. Conceptually, there are three main components, the public Web site, the Adders’ web site, and the master database. There are a range of smaller, optional components such as the batch programs which are run at regular intervals (hourly, nightly, weekly). For large installations, the components can be distributed across several computers using a shared database, or can be run on a single desktop machine.
A single modern workstation can easily handle databases containing tens of thousands of records. Larger installations may require additional hardware. For example the INFOMINE (http://infomine.ucr.edu/) currently contains hundreds of thousands of records, and serves thousands of searches each day. The production system is currently running on three machines: a dedicated database server for the iVia master database (dual 500MHz Pentium II); a Web server hosting the Adders’ Web Site and Public Web Site (dual 1GHz Pentium III with 2GB RAM); and a supporting "helper" installation that runs the batch processes and handles MARC and OAI-PMH imports. Our other installations include a Web crawler that maintains a collection of over 700,000 records (with no Public search interface) using two machines (both AMD Athlon 3200+ with 2GiB RAM).
The iVia Master Database
The iVia data is stored in the master database, which is comprised of several tables. The most important is the record_info table, which contains the metadata available through the public Web site. Each row represents an Internet resource, described by over thirty fields including a unique identifier and URL; provenance and maintainer information; Title, Creator, Keywords, Description and other Dublin Core metadata; LCSH and LCC metadata; MyI, audience level, usage, and other iVia-specific metadata; and the full text and URL status of the resource. The pending records database contains the data currently being edited by the adders, and has the same structure as record_info with a few additional fields. Whenever an adder changes an iVia record, their changes are logged in the modification_history table.
The master database is a MySQL database (any SQL-based database could be used with some reprogramming) and can reside on the same machine as the iVia installation, or on a dedicated database server.
The Inverted Index Databases
The inverted index databases (or more simply, the "inverted indexes") contain occurrence data for every word or phrase appearing in the publicly search-able fields of the record_info database (Title, Creator, Keywords, LCSH, Description, MyI, and a few others) or in the full text of the document. The inverted indexes may be dynamically updated whenever an Adder submits a change to the record_info database, and are also rebuilt each night to incorporate changes made in batches by the supporting programs.
The inverted index databases are implemented using Berkeley DB, a fast, file-based database that allows dynamic updates, and supports multiple concurrent accesses. In practice, several different databases files are maintained by iVia. They must reside on the same server as the iVia Web sites.
The Public Web Site
The public Web site is the interface through which iVia's "patrons" can search and browse the metadata records. It consists of both static web pages and CGI scripts for searching the database (canned_search), viewing records (view_record), browsing recent additions (whats_new) and other functions.
The canned_search CGI program is used to perform searches on the iVia data. Patrons typically perform searches by submitting forms or selecting hyperlinks. The program then looks up the query terms in the inverted indexes, retrieves the results from the master database, sorts the results, and outputs the results. Canned search has many options for specifying search parameters and output formats.
The Adders’ Web site
The Adders’ Web site is used by Librarians to maintain the master database. Adders log in to iVia, and can then edit the iVia master database using a variety of tools. Each user has a set of access privileges that control which tools they may use and how they may directly change the master database (if at all). These permissions are stored in the master database, and can be changed by senior users.
Supporting Programs
The supporting programs shown in Figure 1 are run nightly, weekly, or even hourly to perform maintenance and provide useful features.
The email alert service reads a list of registered users from the alert service subscribers table in the master database and sends periodic email messages listing the new resources. The URL checker reads the URLs from the record_info table, and ensures that they are still accessible over the Internet and have not significantly changed. The generate browse pages program rebuilds the public browse pages each night to reflect changes to the record_info databases. Several other programs perform routine tasks like checking database integrity and generating statistics.
A few of the programs in Figure 1 have more specialized roles. The automatic focused crawler performs a subject-specific web crawl for new Internet resources, creates records for them, and adds them to the record_info table as "second-class" resources (these are excluded from search results by default). The automatic LCC classifier automatically assigns a Library of Congress Classification (LCC) to each record based on its LCSH metadata.
File Layout
iVia is easiest to manage if you use the default layout as explained in the INSTALL file located
in the source tarball.
The default layout is as follows:
- Standard Web Pages, Javascript, Stylsheets: /var/www/IVIA_DIR/htdocs
- Public CGI programs : /var/www/IVIA_DIR/cgi-bin
- Secure (Adders) CGI programs : /var/www/IVIA_DIR/secure_scripts
- HTML Template Files : /home/USER/iVia-installed/data/html_templates
- Configuration Files : /home/USER/iVia-installed/etc
- Other supporting Programs : /home/USER/iVia-installed/bin
Template Files
iVia uses its own macro processor to display dynamic page content. HTML template files are fragments of pages which are processed by CGI programs to produce whole pages. These template files can be modified by end users to produce a user interface that is specific to their needs. In order to discover what variables are available to each page one can use the $CALL(PRINT_MACROS) function from within a template file and they will get a list of variables that have been defined for the program that processes that page. This is useful when you are editing the template files to fit your specific needs. Please be advised that iVia does not guarantee the existence of template files in new releases nor does it actively support the modification of these template files for specific use. As always, we will do our best to help if we can.