Chapter 7: The URL Checker

The URL Checker is a tool for maintaining the URLs in iVia. A program called the Site Checker regularly checks every record to make sure it's URL is still working, another program, called the URL Checker can be used to examine its results.

The Site Checker

The web sites in the live database are regularly examined to see if the site has changed or the URLs have grown stale. In INFOMINE, each record is checked once each week.

When each record is checked, it is assigned a Status code that indicates whether the record requires human maintenance.

At the same time that the site checker checks the sites, it can also downloads several pages of documents for use in iVia full-text searches.

Site Checker Status codes

Here are the status codes that the site checker returns:

The URL Checker

The URL Checker interface is designed to help you quickly find records that need to be maintained. It lets you query the database for records based on their Site Checker status, INFOMINE Category, and the record creator’s institution.

The top of the page contains the URL Checker Search box, which lets you set up a query. You can choose any (or all) of the Site Checker Status results described above, and click on the Search button to get a list of records with that status.

You can filter URL Checker Search results by INFOMINE Category (records from the selected categories will be shown, selecting none is the same as selecting them all) or by the institution of the record creator. Finally, there are options to limit your search to a particular number of records (10 is the default), to expert records only (recommended), and to local records only (recommended).

The result list presents the records that match your query. The Record Id number, Title and URL of the resource are shown; clicking on the URL will open that URL in a new window.

Several standard record editing icons are displayed: the "View Record", "Comment on Record", "Edit Record", and "Delete Record" buttons work as they do elsewhere. The "Clear" icon can be used to clear the Site Checker Status for a particular record, which will remove the record from the list (until next time the Site Checker sets its status).

For Redirected URLs, an Update URL icon appears when a suggested new URL has been identified. Pressing the button replaces the record’s current URL (which is show, marked "Current URL") with the suggested new URL (which is also shown, marked "Suggestion"). Note that the suggestion is not always appropriate (note the "Redirected onsite" description above).

Finally, there are a few unusual features in the result list that do not appear anywhere else. The third major column looks something like this:

Failed
07/13/2004

The topmost element is the record's Site Checker Status, in this case Failed. The middle element is the date that the sire_checker last changed the status of this record: in this example, the date is several weeks ago, which means the Site Checker has looked at the record several times, and each time it has returned the same status: Failed.

Finally, the "No Fix" button is a special operation that is used to tell the site_checker to ignore this record in the future. If you click on "No Fix", the record will never show up in the URL Checker again, even if there is a problem with the site. This feature has a number of uses. For example, if you maintain a record for a frequently changing URL (e.g. a news site like CNN.com), you can mark it "No Fix" and it will no longer be displayed in the list of query results.