DataFountains Documentation

2.2.0


Manuals Page :: Installing Data Fountains :: Configuring Data Fountains

Introduction

Overview

Data Fountains is a tool for discovering and describing Internet resources about a particular topic. After signing on the user is guided through a series of Web pages that generate information describing a particular topic. For information on installing Data Fountains please see Installing Data Fountains and for help with configuring Data Fountains please see Configuring Data Fountains. For a more detailed user guide please see our User Documentation.

Internally, each Data Fountains process associated with a chosen crawl type is called a task. Every user can have multiple simultaneous tasks working in parallel to discover resources.

Each task is performed in six distinct phases, as described below. If this seems complicated, don't worry: Data Fountains keeps track of which steps you have completed, and those that need to be done.

Most users will work through these steps in order. As each step is completed, a link is displayed that takes the user on to the next step.

Usually, Step 1 to Step 3 are performed at one time, then there is a wait while the crawl is run. When the crawl results are available, Step 4 starts and the results are reviewed, but we often choose to return to return to Step 3, and launch a new crawl. Several crawls may be initiated in this fashion, until the result set is satisfactory. Then the user proceeds to Step 5, specifying the metadata desired, and then harvests the metadata records created.

There are several other optional steps that are not always used, but which may be useful. For example, the Customize Crawler Settings step allows you to fine-tune the focused crawler's behavior.

Tending your garden

One way to look at Data Fountains is to compare it to a garden. The user provides seeds and harvests result URLs with metadata. The higher quality (i.e. the more on-topic) the seeds are, the better the results. Weeding, i.e. blacklisting of undesirable URL patterns, will also improve the harvest. Just like in a garden, selection of the best seeds from one generation will produce the best harvest for the next generation.