iVia Creator Metadata Assignment

This page describes the iVia Creator assignment algorithm. The Publisher and Contributor assignment algorithms are identical except that different META tag names are used, and no evaluation metadata is available.

Creator Metadata

Creator metadata identifies the person or organization primarily responsible for the creation of a resource, and is often called Author metadata. Documents can have one or more creators, though additional authors may be better described as contributors or publishers, depending on their specific role. The creator-contributor-publisher trichotomy is drawn from the Dublin Core Metadata Element Set.

The Creator Assignment Algorithm

Creator assignment is a simple extraction process: Creator values are simply read from the HTML document's META tags. The META tags whose name is creator, dc:creator or author (or a plural form) is used.

The initial list is post-processed to remove duplicate entries and blacklist undesirable values.

Creator Assignment Evaluation

We evaluate the Creator assignment algorithm using the iVia metadata evaluation tool and the 1,000 most-recently-modified INFOMINE records. However, the records do not consistently include creator metadata, and often include the same value in multiple representations (e.g. United States Forest Service, US Forest Service and USFS in the same record)so the evaluation is not very useful.

Evaluation Measures

Metadata evaluation metrics are explained on the iVia metadata evaluation page.

Creator is a multiple value field whose values have an unpredictable vocabulary. Values are not always assigned in a consistent format, and a personal name could appear as either John Smith or Smith, John. For this reason, content-word precision and recall are appropriate measures.

Running an evaluation

The evaluate_metadata_assignment program will evaluate Creator assignment if the configuration file has the variable creator = "Creator" defined in the [Fields] section. Our evaluation uses a slightly different formulation: authors = "Creator" reflecting the fact that (for historical reasons) INFOMINE stores its creator metadata in "a" the "authors" database field. An example of the Creator output is shown below.

Creators
Field name: authors
Number of examples:      1000
Number of passes:        850
Number of attempts:      150
Number of exact matches: 1
Exact match accuracy:    0.0010
Average length of expert metadata in letters:   63.8
Average length of assigned metadata in letters: 22.6
Average length of expert metadata in words:   8.7
Average length of assigned metadata in words: 3.2
Total number of expert content words:   4472
Total number of assigned content words: 408
Total number of matching content words: 156
Content word precision: 0.3824
Content word recall:    0.0349
Content word f-measure: 0.0639
Total number of expert stemmed content words:   4414
Total number of assigned stemmed content words: 407
Total number of matching stemmed content words: 155
Stemmed content word precision: 0.3808
Stemmed content word recall:    0.0351
Stemmed content word f-measure: 0.0643
Total number of expert subfields:   2233
Total number of assigned subfields: 151
Total number of matching subfields: 4
Subfield precision: 0.0265
Subfield recall:    0.0018
Subfield f-measure: 0.0034

Results

Row Method Tries SFP SFR CWP CWR
1 Current 151 0.0724 0.0049 0.4125 0.0425

Discussion

The Creator metadata evaluation is reported in the Table above. Few documents supply Creator metadata,thereforeee an assignment was made in only 151 of the 1000 cases, and recall is low. For the metadata assigned, content-word precision is high, at 41%, while sub field precision is low, at 7%, suggesting that the metadata provided by the experts tends to be inconsistently formatted.

Related Work

DC-dot also extracts Creator, Contributor and Publisher metadata from Meta tags. In addition, DC-dot can attempt to assign Publisher metadata to an Internet resource by looking up the owner of the domain name of the Web site hosting the resource.