Current R&D Projects
INFOMINE's research and development concentrates on two main areas:
Web Crawling and Metadata Assignment. Unless otherwise noted, the
software is part of iVia and freely
available for download.
Research Project Overview
Web Crawling
INFOMINE uses a range of Web crawlers to discover new Internet
resources:
- The Nalanda iVia Focused
Crawler is a focused Web crawler based on Dr. Soumen Chakrabarti's
pioneering work in this field. This software is available as a
separate project on the download
page.
- The INFOMINE Virtual Library
Crawler is a "Web robot" that uses the academic virtual
libraries cataloged in INFOMINE as starting points to discover new
resources.
- The INFOMINE Automatic Focused
Crawler extracts a set of topics from INFOMINE, and uses the
Nalanda iVia Focused Crawler to look for new resources similar to
each topic for inclusion in INFOMINE. This software is part of the iVia package, but requires Nalanda iVia
Focused Crawler.
- The Expert-Guided Crawler with
Drill-down is a tool that enables indexers to crawl a Web site
(or a list of Web sites) and discover new resources. This software is
part of the iVia package.
- The Creme De La Crawlers feature automatically rates
robot-created resources as to their likely value in the
collection. The most highly rated are flagged to indexers, to save
them time in discovering significant new resources. Records from both
the Virtual Library Crawler and Automatic Focused Crawler may be
suggested.
There are currently modules in iVia for assigning a range of
metadata fields, including Title, Creator, Contributor, Publisher,
Key phrases, Library of Congress Subject Headings, Library of Congress
Classification, Description, Language Format (i.e. Media Type) and
INFOMINE Categories.
The iVia Project has developed some other metadata tools that stand
apart from our main metadata assignment library.
- The LCSH to LCC LCC assignment
module will automatically assign documents a classification from the
LCC Outline. The project is led by Dr. Eibe Frank (Department of Computer Science
of The University of Waikato) and Dr. Gordon W. Paynter at
INFOMINE. This software is written in Java, and available as a
separate project on the download page.
- PhraseRate is a tool developed by Keith
Humphreys for extracting a set of meaningful, attractive key phrases
from a Web page.
Project Planning and Exploration
Project Planning and Exploration notes.