This page describes the iVia Creator assignment algorithm. The Publisher and Contributor assignment algorithms are identical except that different META tag names are used, and no evaluation metadata is available.
Creator metadata identifies the person or organization primarily responsible for the creation of a resource, and is often called Author metadata. Documents can have one or more creators, though additional authors may be better described as contributors or publishers, depending on their specific role. The creator-contributor-publisher trichotomy is drawn from the Dublin Core Metadata Element Set.
Creator assignment is a simple extraction process: Creator values are simply read from the HTML document's META tags. The META tags whose name is creator, dc:creator or author (or a plural form) is used.
The initial list is post-processed to remove duplicate entries and blacklist undesirable values.
We evaluate the Creator assignment algorithm using the iVia metadata evaluation tool and the 1,000 most-recently-modified INFOMINE records. However, the records do not consistently include creator metadata, and often include the same value in multiple representations (e.g. United States Forest Service, US Forest Service and USFS in the same record)so the evaluation is not very useful.
Metadata evaluation metrics are explained on the iVia metadata evaluation page.
Creator is a multiple value field whose values have an unpredictable vocabulary. Values are not always assigned in a consistent format, and a personal name could appear as either John Smith or Smith, John. For this reason, content-word precision and recall are appropriate measures.
The evaluate_metadata_assignment program will evaluate Creator assignment if the configuration file has the variable creator = "Creator" defined in the [Fields] section. Our evaluation uses a slightly different formulation: authors = "Creator" reflecting the fact that (for historical reasons) INFOMINE stores its creator metadata in "a" the "authors" database field. An example of the Creator output is shown below.
Creators Field name: authors Number of examples: 1000 Number of passes: 850 Number of attempts: 150 Number of exact matches: 1 Exact match accuracy: 0.0010 Average length of expert metadata in letters: 63.8 Average length of assigned metadata in letters: 22.6 Average length of expert metadata in words: 8.7 Average length of assigned metadata in words: 3.2 Total number of expert content words: 4472 Total number of assigned content words: 408 Total number of matching content words: 156 Content word precision: 0.3824 Content word recall: 0.0349 Content word f-measure: 0.0639 Total number of expert stemmed content words: 4414 Total number of assigned stemmed content words: 407 Total number of matching stemmed content words: 155 Stemmed content word precision: 0.3808 Stemmed content word recall: 0.0351 Stemmed content word f-measure: 0.0643 Total number of expert subfields: 2233 Total number of assigned subfields: 151 Total number of matching subfields: 4 Subfield precision: 0.0265 Subfield recall: 0.0018 Subfield f-measure: 0.0034
| Row | Method | Tries | SFP | SFR | CWP | CWR |
|---|---|---|---|---|---|---|
| 1 | Current | 151 | 0.0724 | 0.0049 | 0.4125 | 0.0425 |
The Creator metadata evaluation is reported in the Table above. Few documents supply Creator metadata,thereforeee an assignment was made in only 151 of the 1000 cases, and recall is low. For the metadata assigned, content-word precision is high, at 41%, while sub field precision is low, at 7%, suggesting that the metadata provided by the experts tends to be inconsistently formatted.
DC-dot also extracts Creator, Contributor and Publisher metadata from Meta tags. In addition, DC-dot can attempt to assign Publisher metadata to an Internet resource by looking up the owner of the domain name of the Web site hosting the resource.