iVia Media Type Metadata Assignment

This page describes the iVia Media Type assignment algorithm.

Media Type Metadata

iVia describes the format of each Internet resource with a single Media Type (also known as MIME type) value like text/html or application/pdf. Media Type values are assigned by IANA, who publish an authoritative list.

The Media Type Assignment Algorithm

If an HTTP header is available for a resource, iVia attempts to assign a Media Type based on the Content-type field. In the absence of a header, the Media Type is assigned by using the libmagic library (which is the basis of the common Unix file command) to analyze the document. libmagic determines the Media Type of a using a database of rules that map the unique low-level features of numerous different types to their Media Type. For example, the rule below states that if position 0 in a file contains the string %PDF-, then the Media Type of the file is application/pdf.

0    string     %PDF-     application/pdf

On the author's Linux workstation, there are 308 rules, which identify 183 different file types.

Media Type Assignment Evaluation

We have not evaluated Media Type assignment because we do not have a suitable test set.