- Table of Contents
- ::
- Introduction
- ::
- Overview
- ::
- Architecture
- ::
- Installation
- ::
- Importing
- ::
- Exporting
- ::
- Remote Services
- ::
- Databases
- ::
- Searching
- Appearance
- ::
- New Categories
- ::
- Troubleshooting
- ::
- Contact
The iVia files
iVia is split into three types of files: programs (binary files and scripts), themes (template HTML files), and data files (inverted indexes). Some of the programs are located in the IVIA_INSTALLED/bin directory and some of them are installed as CGI programs and run by apache. The themes are all installed in IVIA_INSTALLED/data/html_templates and are read by the CGI programs to produce dynamic output. The inverted indexes are located in IVIA_INSTALLED/data/inverted_indexes.
iVia command line programs
| Table of iVia command line programs | |
|---|---|
| MARC_importer | Import MARC records into the iVia database. |
| OAI-PMH-importer | Import records using the OAI-PMH importer |
| afcrawler | Run an automatic focused crawler process. |
| afcrawlerd | Run an afcrawler demon process. |
| alert_service_email_dispatcher | Send email to users. |
| browse | ? |
| cache_compactor | Compact the page_cache. |
| check_berkeley_env.sh | Check the berkeley databases. |
| check_inverted_indexes.sh | Check the inverted index databases. (Calls check_berkeley_env.sh) |
| click_throughd | Demon process that collects information about URLs clicked on through the search interface. |
| create_flat_record_level_index | Create a record level index. (Run by nightly.sh) |
| create_hierarchical_record_level_index | Create a hierarchal record index. (Run by nightly.sh) |
| create_inverted_indexes.sh | Create the inverted indexes. (Run by nightly.sh) |
| create_ispell_dictionary.sh | Create the ispell dictionary. (Run by nightly.sh) |
| create_lcc_outline_hierarchy | Create the lcc hierarchy. (Run by nightly.sh) |
| create_restricted_access_index | Create restricted access index. (Run by nightly.sh) |
| create_word_level_index | Create word level index. (Run by nightly.sh) |
| db_detect.sh | Detect the Berkeley version. |
| delete_blacklisted_records | Delete blacklisted records in record_info. (Run by nightly.sh) |
| delete_duplicate_records | Delete duplicate records in record_info. (Run by nightly.sh) |
| delete_non_english_records | Delete non English records in record_info. |
| delete_stale_records | Delete stale records in record_info. |
| evaluate_metadata_assignment | Evaluate the metadata assignment. |
| exec_cgi | Execute a CGI program. |
| expert_guided_crawler | Expert guided crawler. |
| generate_browse_LCC_data | Generate LCC browse hierarchy. (Run by nightly.sh) |
| generate_browse_list_data | Generate browse list. (Run by nightly.sh) |
| generate_browse_tree_data | Generate browse tree. (Run by nightly.sh) |
| generate_infomine_category_classifiers | Create INFOMINE category classifiers. |
| generate_page_digests | Generate page digests. |
| generate_record_count.sh | Generate record count. |
| generate_statistics | Generate database statistics. |
| generate_statistics_html | Generate statistics pages. |
| generate_word_frequencies | Generate word frequencies. |
| hourly.sh | Hourly maintenance script. (Intended to be run via cron.) |
| identical.sh | Test records for duplicates. |
| index_record | Index a record. |
| ivia-adduser | Add a user to the iVia system. |
| ivia-config | Query the iVia configuration file. |
| nightly.sh | Nightly maintenance script. |
| page_cache_server | Currently not used. |
| page_dump_importer | Import a Nalanda page dump into an iVia database. |
| query_logger | Log iVia search queries. |
| read_ini_file | Query an iVia INI file. |
| record_info_to_html | Create HTML files from a record_info database. |
| record_level_data_dumper | Run a query on the record_info database. |
| site_checker | Check all "sites" (URLs) in the record_info database. Typically runs via nightly.sh. |
| site_checker_single | Check a "site" (URL). |
| train_lcc_outline_classifier | Generates a hierarchy of LCC classifier files from a training corpus. |
| update_assign_fields | Update field assignments. |
| update_canonize_fields | Canonizes fields in the record_info or pending_new_records tables. |
| vlcrawler | Run a virtual library crawler process. |
| vlcrawler_single | Crawls a single virtual library. |
| vlcrawlerd | Run a vlcrawler demon process. |
| weekly.sh | Weekly maintenance script. (Intended to run via cron.) |
| word_level_mask_decoder | Decodes an inverted index word level mask. (Intended for debugging inverted indices.) |
iVia template files
The iVia template files are located in IVIA_INSTALLED/data/html_templates and they are organized into different themes. If you wish to create a new theme simply copy the supplied branch of a particular theme and then edit them to fit your desired theme. There are themes for public pages and for what we call dbase_mange which is the adder interface to iVia. The template can be set in the iVia.conf file located in the IVIA_INSTALLED/etc directory. The following is a hierarchal layout of the template files for three different uses in iVia. The first is under dbase_manage making it a theme for the adders interface. The second theme located in IVIA_INSTALLED/data/html_templates/iVia is a public theme for the public search engine area of iVia. The third is a micro theme designed to be referred to in the URL for including search results in other pages. (ex: http://infomine.ucr.edu/canned_search?theme=p_title_desc). In the list below, bullets that contain no extension are folders or sub-folders in the HTML_templates directory tree. The folders correspond to the CGI program that uses the files contained in the folder. If the folder is called canned_search then the program canned_search uses the template files located in that folder.
- dbase_manage (Adder management themes)
- iVia (iVia adder management theme)
- add_adder
- exists.html
- success.html
- adders
- deleted.html
- public_info.html
- report_page
- bottom.html
- data.html
- top.html
- add_remove_js.html
- batch
- delete.html
- delete_records_bottom.html
- delete_records_top.html
- export.html
- export_records_no_results.html
- export_records_preview.html
- import.html
- import_MARC_setup.html
- import_MARC_tape.html
- blacklister
- bottom.html
- repeated_section.html
- top.html
- blacklist_url
- get_url.html
- help.html
- prompt_begin.html
- prompt_end.html
- prompt_section_begin.html
- prompt_section_data.html
- update_begin.html
- update_data.html
- update_end.html
- campus_management
- submit_new_campus.html
- view_current_campuses_bottom.html
- view_current_campuses_data.html
- view_current_campuses_top.html
- canned_search
- bottom.html
- data.html
- invalid_search_parameters.html
- top.html
- zero_results.html
- change_adders
- bottom.html
- delete_begin.html
- delete_data.html
- delete_end.html
- top.html
- change_my_infomine
- add_form.html
- common_js.html
- data.html
- edit_form.html
- view.html
- comments
- comment_creation_edit.html
- comment_creation_submit.html
- dbm_view_record_set
- begin.html
- data.html
- end.html
- delete_single_record.html
- display_message.html
- email_notification
- list_bottom.html
- list_center.html
- list_confirmed_user_data.html
- list_top.html
- list_unconfirmed_user_data.html
- README
- find_faults
- bottom.html
- fault.html
- message.html
- section_end.html
- section_start.html
- top.html
- Fragments
- category_check_boxes.html
- field_check_boxes.html
- global_replace
- bottom.html
- data.html
- setup.html
- top.html
- import
- dBase
- import_records_bottom.html
- import_records_data.html
- import_records_top.html
- lii_import.html
- scp_import.html
- institutions
- add.html
- interactive_crawler
- start.html
- menu
- main.html
- suggestions.html
- metadata_extractor
- bottom.html
- hidden_inputs.html
- me1_settings.html
- me2_download_error.html
- me2_download_start.html
- me2_download_title.html
- me3_begin.html
- me3_data.html
- me3_end.html
- me3_error.html
- me4_begin.html
- me4_data.html
- me4_end.html
- me4_keyword_error.html
- me5_begin.html
- me5_categories_error.html
- me5_data.html
- me5_end.html
- me6_begin.html
- me6_end.html
- me7_begin.html
- me7_data.html
- me7_end.html
- me7_error.html
- me8_description.html
- me8_error.html
- me9_done.html
- me9_zero_categories.html
- top.html
- my_infomine
- bottom.html
- data.html
- search.html
- top.html
- record_builder
- build_begin.html
- build_data.html
- build_end.html
- expert_guided_crawl_email.txt
- expert_guided_crawl.html
- expert_guided_crawl_task.html
- setup_begin.html
- setup_end.html
- setup_invalid_url.html
- setup_valid_url.html
- suggest.html
- url_filter_begin.html
- url_filter_data.html
- url_filter_end.html
- record_editor
- dup_check
- begin.html
- data_as_full.html
- data_as_summary.html
- end.html
- editor
- alphabetize_javascript.html
- editor.html
- editor_javascript.html
- frameset.html
- words_adding_removing_javascript.html
- helper_view_list
- bottom.html
- data.html
- top.html
- preview
- bottom.html
- section_common_bottom.html
- section_common_data.html
- section_common_top.html
- top.html
- search
- all.html
- live.html
- pending.html
- submit
- error.html
- submit.html
- record_info_stats
- begin.html
- data.html
- end.html
- remote_services
- assign_metadata_record.txt
- augment_nsdl_collection.html
- background.html
- delete_harvest_set.html
- email_notification.txt
- expert_guided_crawl.html
- overview.html
- targeted_link_crawling.html
- replace_field.html
- search_box
- bottom.html
- display.html
- options.html
- top.html
- show_categories
- bottom.html
- data.html
- top.html
- show_words_to_add
- bottom.html
- data.html
- my_infomine_bottom.html
- top.html
- sql_search
- bottom.html
- data.html
- top.html
- sql_search_box
- bottom.html
- categories.html
- hide_categories.html
- top.html
- standard_dbase_manage_bottom.html
- standard_dbase_manage_close_table.html
- standard_dbase_manage_popup_bottom.html
- standard_dbase_manage_popup_top.html
- standard_dbase_manage_top.html
- submit_user_info.html
- suggest.html
- tests
- test_duplicate_checker_begin.html
- test_end.html
- test_metadata_assignment_begin.html
- test_rich_text_finder_begin.html
- theme_editor
- bottom.html
- data.html
- top.html
- undelete_single_record.html
- url_checker
- bottom.html
- data.html
- help.html
- summary.html
- top.html
- user_account
- creation_form.html
- creation_submit.html
- editor.html
- submit.html
- user_info_editor.html
- user_management
- add_user.html
- undelete_adders_submit_bottom.html
- undelete_adders_submit_data.html
- undelete_adders_submit_top.html
- undelete_adders_view_bottom.html
- undelete_adders_view_data.html
- undelete_adders_view_top.html
- view_adders
- bottom.html
- data.html
- top.html
- view_log
- list_files_begin.html
- list_files_data.html
- list_files_end.html
- list_projects_begin.html
- list_projects_data.html
- list_projects_end.html
- view_file_begin.html
- view_file_end.html
- view_my_infomine
- bottom.html
- data.html
- top.html
- view_record
- bottom.html
- modification_bottom.html
- modification_data.html
- modification_top.html
- top.html
- whats_new
- bottom.html
- data.html
- top.html
- iVia (public interface theme)
- alert_service
- common_bottom.incl
- common_header.incl
- common_top.incl
- confirmed.html
- created_bottom.html
- created_data.html
- created_top.html
- editor.html
- header.html
- incomplete_form_error.html
- main.html
- missing_fields_bottom.html
- missing_fields_data.html
- missing_fields_top.html
- modified_bottom.html
- modified_data.html
- modified_top.html
- no_such_user.html
- password_sent.html
- unsubscribed.html
- user_exists.html
- browse
- bottom.html
- child_list_begin.html
- child_list_data.html
- child_list_end.html
- columns_begin.html
- columns_between.html
- columns_end.html
- error_message.html
- page_list_begin.html
- page_list_data.html
- page_list_end.html
- robot_option.html
- title_list_begin.html
- title_list_data.html
- title_list_end.html
- top.html
- canned_search
- bottom.html
- data.html
- error.html
- invalid_search_parameters.html
- top.html
- zero_results.html
- display_message.html
- error.html
- footer.html
- generate_canned_search.html
- header.html
- links.html
- mail_form
- mail_form.html
- remote_services
- assign_metadata.html
- assign_metadata_record.txt
- augment_nsdl_collection.html
- background.html
- delete_harvest_set.html
- email_notification.txt
- expert_guided_crawl.html
- overview.html
- risi_crawl_site
- begin.html
- end.html
- test.html
- search
- search.html
- search_box
- bottom.html
- categories.html
- display.html
- options.html
- top.html
- suggestion
- preview.html
- secure_preview.html
- view_adders
- bottom.html
- multi_data.html
- multi_top.html
- top.html
- view_record
- bottom.html
- top.html
- view_record_set
- begin.html
- end.html
- record_begin.html
- record_data.html
- record_end.html
- whats_new
- bottom.html
- data.html
- top.html
- p_title_desc (Custom theme for canned searches)
- canned_search
- bottom.html
- data.html
- invalid_search_parameters.html
- top.html
- zero_results.html
The Inverted Index Database
Once there is data in the master database, an inverted index database is created to support the iVia search functions.
The iVia inverted index is a Berkeley DB database that associates every word (and many of the phrases) that appear in the collection with a list of the records, offsets, and fields in which they occur. The canned_search program uses these indexes to perform searches.
Warning: Berkeley DB is sometimes unstable under very high loads. If you have trouble with these databases, consider disabling incremental inverted index updates.
The Inverted Index Files
The inverted index databases are stored in the inverted_indexes_dir directory identified in the main iVia.conf file. The inverted indexes will comprise a set of Berkeley DB database files (whose filenames end in ".db") and three other files used internally by Berkeley DB for file locking, shared memory management, and similar functions.
To regenerate the inverted index files, run the create_inverted_indexes.sh script. Usually this will be done as part of the regular nightly tasks.
The database files should always have the user set correctly for your installation, and be readable and writable. They are read by the every time a patron performs a canned search, and whenever you run the word_level_data_dumper command or one of the commands used by create_inverted_indexes.sh. Whenever the databases are accessed, the program also needs to be able to WRITE to the lock files. (Note: This causes problems if your apache installation has a special user like www-data and the lock files are deleted because they may be recreated by the www-data user without the necessary read and write permissions. Normally, you will need to fix this manually as root.)
Debugging the Inverted Index Database
If the files become corrupt (searches stop working) fix them as described in the chapter on Troubleshooting).
You can check what iVia data is in a database with the word_level_data_dumper program.
Berkeley DB comes with diagnostic utilities: db_stat and db_recover. (In Debian, these are renamed according to the version in use; for Berkeley DB version 4.2 install the libdb4.2-util package where the programs renamed to db4.2_stat and db4.2_recover.)
iVia provides wrapper scripts around these functions: check_berkeley_env.sh will quickly check the environment files (fast), and check_inverted_indexes.sh will examine the databases for corruption (slow). For added reliability, you may want to run check_berkeley_env.sh as a cron job every five minutes.