The INFOMINE Automatic Focused Crawler is a program that crawls the Web to find web pages that are part of the same academic community as other resources in INFOMINE. The resources are focused around a topic -- represented by shared Library of Congress Subject Headings -- and evaluated in their capacity as authoritative resources and as hubs. The best authorities and hubs will be added to INFOMINE as automatically created records.
The first part of the process is to build a topic tree of resources that are known to be about specific topics. We use the Library of Congress Classification for our topic tree. Next we cycle though each of the topics, using the Nalanda iVia Focused Crawler to search for new records that are related to each topic.
The INFOMINE Automatic Focused Crawler is based on the Nalanda iVia Focused Crawler and the INFOMINE libraries. It is highly efficient, and highly parallellised. It respects robots.txt files (usually!).
The User Agent string has the form:
INFOMINE Automatic Crawler/3.0 (see http://infomine.ucr.edu/projects/af_crawler)
If you have any questions (or complaints!) about the crawler, please contact crawler at infomine.ucr.edu.