DataFountains Documentation
2.2.0
| Manuals Page | :: | Installing Data Fountains | :: | Configuring Data Fountains |
Introduction
Overview
Data Fountains is a tool for discovering and describing Internet resources about a particular topic. After signing on the user is guided through a series of Web pages that generate information describing a particular topic. For information on installing Data Fountains please see Installing Data Fountains and for help with configuring Data Fountains please see Configuring Data Fountains. For a more detailed user guide please see our User Documentation.Internally, each Data Fountains process associated with a chosen crawl type is called a task. Every user can have multiple simultaneous tasks working in parallel to discover resources.
Each task is performed in six distinct phases, as described below. If this seems complicated, don't worry: Data Fountains keeps track of which steps you have completed, and those that need to be done.
-
Step 1: Select the crawl type: Select the type of crawl from one of:
- Focused Crawl
- Expert Guided Crawl
- Targeted Link Crawl
- Step 2: Enter the seed URLs: Enter a starting URL or list of starting URLs depending on the type of crawl to serve as a(n) starting point(s) for the crawl.
- Step 3: Launch the crawl!: Select the crawl parameters, and start the selected Crawler.
- Step 4: Review results: Review the results returned by the crawler and if required blacklist and/or add more seed URLs then repeat step 3.
- Step 5: Generate metadata: Choose an appropriate data product for your purpose and start the metadata generation process. (This could take a while)
- Step 6: Export metadata: Export the generated metadata in one of four available formats: CSV (SDF), OAI-PMH harvest, MARC records, XHTML template based pages.
Most users will work through these steps in order. As each step is completed, a link is displayed that takes the user on to the next step.
Usually, Step 1 to Step 3 are performed at one time, then there is a wait while the crawl is run. When the crawl results are available, Step 4 starts and the results are reviewed, but we often choose to return to return to Step 3, and launch a new crawl. Several crawls may be initiated in this fashion, until the result set is satisfactory. Then the user proceeds to Step 5, specifying the metadata desired, and then harvests the metadata records created.
There are several other optional steps that are not always used, but which may be useful. For example, the Customize Crawler Settings step allows you to fine-tune the focused crawler's behavior.