Chapter 13: How searches are processed

The most common way to find records in an iVia installation is to perform a search, by using either the default form on the site homepage, a form from one of the other web pages, or via a previously-created "canned_search" link.

Search parameters

The iVia canned search parameters are documented in Appendix A of the iVia Adders' Manual. The most important are the query term (parameter query) and the fields to search (fields). Other parameters allow you to set the ranking method (alphabetical or by relevance), limit the records based on creator (expert Vs robot), or by access (institution), or by category (categories).

Building a result set

When the canned search program receives a query, its first action is to parse the user's boolean query and to select the complete set of records that exactly match the query from the inverted indexes. A range of filters may be applied, for example to limit the records to expert records, or records from a given category. The results are then ranked, either alphabetically or by relevance and displayed to the user.

Ranking the result set by relevance

iVia's result ranking system is particularly sophisticated, having been refined by the INFOMINE's librarians over a number of years for optimal results on fielded queries. A number of parameters are used to assign a score to each record in the result set; the records are then output in order of decreasing score.

A score is calculated for each record, based on the number of times each term in the query occurs in that record, and the fields it occurs in. Negated query terms are ignored when calculating the score (e.g. in a search for STATE GOVERNMENT AND NOT CALIFORNIA the score will be be based only on occurrences of STATE and GOVERNMENT).

We calculate the score as follows:

The field score is found by adding the number of query terms that appear in the field. Where the field weights are:

Bonuses are only added if there are more than one search term. A phrase bonus of 10.0 is awarded if a phrase match occurs in the field (i.e. the query is a phrase search, or the query consist of simple words and all the words appear in order). An additional subfield bonus of 5.0 is awarded if there is a phrase match, and all the search terms also appear as a subfield match. These bonuses are multiplied by the field weight of the field they occur in.

Finally, a penalty of 0.75 is incurred if the record is a robot-created record. This is designed to make expert-created records appear more prominent in the search results (since they are likely to be higher-quality records).

Adjusting the result ranking algorithm

The result ranking algorithm is implemented in the ScoreWeights class in src/libs/iVia/QueryNode.h. You can adjust the weights by updating the constant values in this class and recompiling iVia.