The Targeted Link Crawl Service
The Targeted Link Crawl Service lets people (and computer programs) use iVia's Metadata Assignment tools to generate metadata describing Internet resources. The most common uses for the Targeted Link Crawl Service are to automatically generate metadata records describing a resource, or to generate suggested metadata values for subsequent editing by a human expert.
When RiSI is enabled, a form is provided for submitting URLs to the Metadata Assignment service on the iVia Adders' Homepage.
In iVia 5.0.0, the assignment service can assign metadata to HTML and PDF files. The service is polite: it obeys robots.txt files (even though, technically, it is not a robot). The service only operates in both foreground and background mode.
This chapter will explain how to use the Targeted Link Crawl Service in foreground and in background mode.
Assigning metadata to a Web page in foreground mode
In foreground mode, the Targeted Link Crawl service is invoked by running a CGI script over HTTP, and the results are immediately returned in the HTTP response. In other words, a request can be made by submitting a form using a Web browser, and the results will then be displayed by the Web browser.
The Targeted Link Crawl service can be requested using the form available from the Adders' Homepage under RiSI.
The form is very simple to use. Simply enter a URL in the URLs field, make sure the Mode field is set to Foreground: display results in Web page, and press the Submit query button. Metadata describing the URL will be returned in your Web browser as a text document.
If you require metadata bout several URLs, you can enter them all in the URLs box in the Web form, with one URL per line.
Computer programs can call the risi_assign_metadata CGI script directly, without going through the form. For more detail, see the CGI Parameters chapter.
Assigning metadata to a set of pages in background mode
If you are generating metadata for a large number of URLs, it may be more useful to perform the operation in background mode. In this case, the request is made with the Mode parameter set to Background and a Harvest tag provided.
The metadata is not returned to the Web browser. Instead, a metadata iVia record is created for each URL, metadata is assigned to the record, and it is stored in the iVia database. The activities are logged in a task log.
Later, the records can be harvested by OAI-PMH using the harvest tag as an OAI-PMH set name. The location of the OAI-PMH server, and example ListRecords query, appear in the main log file for easy reference.