Importing for a Corpus

Import Single Document is available under the Imports tab only for Corpora collections. It supports manually importing an external file into the corpus, rather than going through a connector or importing an existing RDF representation of the corpus.

When selected, use the Browse… button to select a source file. Its text and metadata will be parsed by the Apache Tika content analysis toolkit, which can handle these supported formats. The Show Imported data button on the next screen allows reviewing retrieved information. Most supported file formats will present three sections:

  1. common Metadata Properties such as file name, media type, title, creator;

  2. Content, which is the actual document’s text (where applicable);

  3. Other Properties, which include various ones the importer was unable to label and are therefore referred to with their URIs.