Importing Spreadsheets

Import Spreadsheet using Template

Import Spreadsheet using Template is available under the Import tab for all collection types except Crosswalks, Content Tagsets and Corpora.

This importer lets the user select a spreadsheet and a “template” that will be used to convert and store the spreadsheet data. The “template” is created using the mapping process explained in the Import Spreadsheet using Pattern importer. The imported spreadsheet must have exactly the same structure as the spreadsheet used to develop the template. The names and order of the columns must be exactly the same. If multiple worksheets are used, the order and structure of each worksheet (even for worksheets that are not imported) must be the same.

The mapping can also be created or edited using TopBraid Composer if the mapping capabilities are insufficient and a more complex transformations is required (e.g. concatenation of values). TopBraid Composer’s SPINMAP tool provides a drag-and-drop interface that makes it especially easy to create more complex mappings.

Templates developed with TopBraid Composer must be stored in files with “.tablemap.” in their name (for example, myMapping.tablemap.ttl) and be uploaded to the EDG server to be available to EDG users.

Import Spreadsheet using Pattern

Import Spreadsheet using Pattern is available under the Import tab for all collection types except Crosswalks, Content Tagsets and Corpora.

To use this importer, the worksheet must have a header row with the names of columns. EDG will find the first row with data and will assume that it is the header column.

The Import > Import Spreadsheet Using Pattern link shows the following screen:

TopBraid EDG Import Spreadsheet using Pattern

TopBraid EDG Import Spreadsheet using Pattern

Use Browse to select the spreadsheet file to import. Supported file types are an Excel file (.xls or .xlsx), a tab-separated value (.tsv) file, or a comma-separated value (.csv) file. The file should have the expected extension.

Because an Excel file may have more than one sheet of data, there is an option to specify a sheet index value to identify which sheet to import. The default is 1, for the first sheet. The sheet index counts all sheets in an Excel workbook, including hidden ones. For example, if you enter a 3 here and EDG seems to import the second sheet, there may be a hidden one between the first and second sheet that made the third one look like it was the second one when Excel was displaying the workbook. The Excel online help explains how to check for the existence of hidden sheets.

The Entity type for the imported data identifies the class of the assets you will be importing. Each row in the spreadsheet will be imported as an instance of the selected class and the spreadsheet columns can be mapped to the declared properties of the class. Select Next to continue and specify the mapping.

Note

All imported assets are given the same type. To import assets of a different type, either import the same spreadsheet multiple times, with different mappings, or bulk edit the data after import.

Select Spreadsheet Type

The Select Spreadsheet Type view enumerates five possible (column-wise) spreadsheet layout patterns, showing an example of each pattern.

The 1. No Hierarchy layout is the most common and simplest to use. We recommend users become familiar with it first, before using other patterns.

Note

The No Hierarchy import pattern can create relationships in the imported data, including hierarchical relationships. However, referenced asset must exist prior to the import when using with this pattern. In the mapping, the column used to find existing assets to reference (e.g. hierarchical parents) can be indicated. This can be addressed, for example, by importing the same spreadsheet multiple times using different mappings - perhaps once to create assets and then again to add relationships between assets.

For data explicitly structured as a hierarchy, like a taxonomy, there are four layouts from which to select. The main difference between these four layouts and No Hierarchy is that children and parents can be imported and connected in a single import. EDG will create and relate both the children and their parents in the same import, even if they did not exist prior to the import.

In the hierarchical layouts, each row also indicates its hierarchical path, either explicitly (absolute path, #2, #3, #4) or implicitly (recursive path #5); note that lighter text in the layout patterns indicates optional data. These are complex import patterns that behave differently depending on the URI construction method set for the asset collection. Depending on the URI construction method for the collection, they may not be able to be used.

TopBraid EDG Select Spreadsheet Type Page

TopBraid EDG Select Spreadsheet Type Page

Note

Note the header row of column labels in every layout. The imported spreadsheet must always have a header row.

Below the five layout options, the view will show a sample of the spreadsheet’s actual data. The following image shows a spreadsheet of airport codes.

TopBraid EDG Source Spreadsheet Page

TopBraid EDG Source Spreadsheet Page

Select the layout link that most closely corresponds to the structure of the spreadsheet.

Import No Hierarchy Spreadsheet

After selecting 1. No Hierarchy the Import Spreadsheet page appears where the user defines the data-mapping rules from the spreadsheet columns into the target properties of the class.

The page is broken into sections:

  • Column Mappings

  • Default Concept Scheme, if a Taxonomies collections

  • Unique Identifiers

  • various selectable options

  • prompt to enter the mapping template name if saving the pattern for future use

  • Your Source Spreadsheet, data read from the spreadsheet

When importing into a Taxonomies collection and not selecting a concept scheme, EDG creates a new scheme using the name of a spreadsheet. All concepts that do not have a broader parent nor are defined as top concepts of some scheme, will be made into top concepts of the selected or automatically created scheme.

Column Mapping Section

In the Column Mapping section, the user specifies the target property to which each spreadsheet column corresponds. The target properties are taken from target entity type (class or asset type) identified on the previous page. When column names and property names are similar, EDG automatically proposes the mapping. When mapping a relationship, the user can indicate inverse relationships to the target entities.

Unmapped columns and their data are ignored during the import.

The following example shows an example Column Mapping.

TopBraid EDG Import Spreadsheet Page

TopBraid EDG Import Spreadsheet Page

For the mapping to be successful, the datatype of the spreadsheet cell values should match that of the target property into which the column is being mapped. Do not use the string “abc” in a cell value being mapped into an target property with an integer datatype, for example.

In cases where a string-valued target property supports language tags, the user is can optionally set the Language value for the import to set.

If an imported row will result in a new instance, rather than adding data to an existing instance, then one of columns should be mapped to the label property or th preferred label if a Taxonomies collection. Those are used as the EDG display name, for auto-complete, etc.

When importing relationships EDG needs to find the existing asset to which to add a reference. If the asset does not yet exist, EDG may be able to create a URI as the value of the reference.

After selecting a relationship property from the dropdown, the following methods for directing EDG on how to build relationship values are possible:

  1. If the selected class has a designated Setting a Primary Key for a Class, no additional information is needed. EDG uses the value in each row of the mapped column to form the URI of the new instance according to the primary key definition. This option is demonstrated in the above screenshot – airport country is a relationship from airports to countries and the ontology defines a primary key for the class Country.

  • Therefore, the values in the mapped column must the exact value of the property used as the primary key meaning it must be unique for all instances of the class.

  • In the example, the Country primary key values are 2 character country codes and the Country Code column with those values is mapped to the “airport country” relationship.

  • Note that in this case, imported rows will always construct a reference with that URI structure regardless of whether it exists in the collection or not.

  1. If the values in the mapped column are actually valid URLs, then they can be used “as-is” to be the URI of the referenced asset, as indicated by the associated Use values as URIs label.

  2. If neither option 1 nor option 2 apply, the user selects a property of the related class on which to to match. The property will be used to find assets at the end of the relationship and will only create the relationship if the asset already exists in the collection.

  • For example, if Country did not have a defined primary key, the user could map the Country Code column and select “ISO 3166-2 alphabetic country code” as the property to match.

  • Values of the matching property must be unique across all instances of the class. If duplicate values are found, then the related resource will be assigned arbitrarily.

Option 3 is demonstrated in the screenshot below – after removing the primary key from the class Country.

TopBraid EDG Import Spreadsheet - Country Code Mapping

TopBraid EDG Import Spreadsheet - Country Code Mapping

For inverse relationships, the spreadsheet column represents links from instances of some other class to the instances the import creates. Similar to forward relationships, if an inverse relationship is the chosen mapping, then there is a further choice of which referencing-class property to use to identify the referencing instances.

Note

  • As explained under Other Parameters below, if the target of a relationship has the same entity type as the entity type chosen for the import AND Option 3 is being used then the Override existing values option must be unchecked for the relationship to be created.

  • When importing into a Reference Datasets collection, one of spreadsheet columns must map to the primary-key property of the main entity (class) for the dataset. For example, the screen image above identifies this field as the IATA code.

  • When importing into a Taxonomies collection, the user can select a concept scheme to contain the imported concepts. Otherwise, EDG will create a new concept scheme and make all concepts that have no parents in the spreadsheet its top concepts.

Unique Identifiers

This section explains the logic EDG uses to generate URIs for the imported data.

If the import target class has a Setting a Primary Key for a Class, no selections are required . Instead, use the Column Mapping section to map one of the spreadsheet columns to the primary key property. Otherwise, some selections are required.

The available selections depend on the URI Construction Rules configured for the asset collection.

If the URI Construct Method is label, specify the column(s) to be used to generate the URI of each imported row as shown below.

TopBraid EDG Unique Identifiers Based on Labels

TopBraid EDG Unique Identifiers Based on Labels

Typically, the spreadsheet column containing the label value will map to the Id column #1, leaving the rest of mappings empty. However, it is possible to select a different column to generate URIs or even a combination of columns. When multiple columns are selected, their values are concatenated to form the URI.

The Start of URIs option is also available to modify the default namespace of the collection to be used as the basis of the generated URI for the import.

For a successful import adding new information to existing assets, make URI choices that will match the URIs of those assets.

If the URI Construct Method of the asset collection is either the counter or uuid, a different set of options are needed, as shown in the following figure.

TopBraid EDG Unique Identifiers Based on Counter or UUID

TopBraid EDG Unique Identifiers Based on Counter or UUID

If the import is creating new instances, leave the selection empty and EDG will generate URIs according to the currently configured URI Construction Method for the collection, using the default namespace for the URIs.

If the import is adding information to existing assets:

  • use a spreadsheet containing a column with values that are the URIs of existing assets and select that column as the URI column;

  • match on a property to find the existing assets. Values of this property must be unique for the entity type. If duplicate values are found, assignments will happen arbitrarily.

Other Parameters

Other parameters are located directly below the Unique Identifiers section.

Selecting Overwrite existing values will delete an existing value for a mapped property before adding its new (different) value; otherwise, new values will be added to existing ones.

If the imported rows are adding new data values to existing instances and/or adding new instances, it is best to make sure that the Override existing values option is unchecked. Checking this option has the following consequences:

  • If an instance already exists and has a value for any of the mapped columns, the value will be replaced with the spreadsheet data.

  • Relationships between instances of the same type that rely on matching of values will not be created (because these values may be overridden as part of the processing).

  • When working with a Taxonomies collection, a combination of checked Override existing values and the No Hierarchy pattern will always make imported instances top concepts of a new Concept Scheme, even if they already exist in the Taxonomy and have parent concepts.

Selecting Record each new triple in change history (warning: not recommended for large files) prevents EDG from recording the addition of each new triple in the change history.

Note

This option will not let you build relationships to assets that are members of the same class as the one you are importing – because these values can be replaced as part of the import.

A Preview button on the Import Spreadsheet form shows the RDF triples that would be generated with the currently configured settings. The browser’s Back button returns to the form.

Make this a reusable mapping template is optional and saves all of the settings on this form for later reuse. Reusable mappings are selectable using Import Spreadsheet using Template on the Import tab instead of Import Spreadsheet using Pattern. When used, a drop-down list of the saved template names appears for selection.

When satisfied with the sample data shown on the preview, click the Finish button. EDG will start the import, running it in the background.

Import Using Hierarchical Spreadsheet Patterns

After selecting one of the hierarchical patterns, there are three sections:

  • Column Mappings

  • Hierarchy

  • Unique Identifiers

and, as on the previous page, for convenience an example of the source spreadsheet data is shown.

There are also URI column selections below the Unique Identifiers section, unless the URI Construct Method is counter or uuid, where no option to specify columns to use as URIs appears. EDG will always generate the URIs following the chosen method.

When the asset collection uses the counter or uuid methods, these importers CANNOT be used to overwrite existing hierarchies – they can only create new assets:

  • If the need is to add information to the previously imported hierarchies, use No Hierarchy import.

  • If the need is to add a child tree consisting of new resources to an existing hierarchy, these importers can be used. However, the top of the new tree will not be connected to an already existing parent, that connection must be added after the import.

Most of the options on the hierarchical import pages are the same as those described for the No Hierarchy pattern. Those are not described here again and this section of the guide focuses on the unique aspects of the hierarchy mapping.

For all the hierarchical patterns, select a Hierarchy Property (e.g. “has broader”) to connect items in the hierarchy.

Note

  • All hierarchical levels will be connected using the same relationship.

  • To create different relationships between levels, use the No Hierarchy pattern.

The Generate in inverse direction checkbox will reverse the direction of how the property specified in Hierarchy property is applied.

When working with Taxonomies collections, there is an option to select an existing Concept Scheme. If not specified, the importer creates a new concept scheme using the name of a spreadsheet. All concepts that do not have hierarchical parents will be made top concepts of the scheme.

Path with Separator Pattern

This pattern works ONLY if the URI Construct Method is label.

For Path with Separator spreadsheets, in which a spreadsheet entry such as “World > Europe > France” indicates the hierarchical structure above the term “France”, the Hierarchy mapping section works as follows:

  • Select a column containing the path and type a separator e.g., “>”.

  • Identify the Column containing the last node of each path string

  • In the Column Mapping section, to generate a name for imported resources, make sure to assign some column as the preferred label (in case of Taxonomies) or as the label (for all other asset collections) – to generate labels. This will typically be the same column as the one you selected in the Hierarchy section as containing a last node of the path.

  • Map this column again in the Unique Identifiers section. If not specified, EDG will use row numbers to generate URIs. Alternatively, use other column(s) to generate URIs.

Column-based Tree Pattern

When using this pattern and when URI Construct Method is counter or uuid, the spreadsheet needs a single column containing the label for each asset. When URI Construct Method is label, there is not need for such a column, EDG will assume that the hierarchy columns contain the labels.

For Column-based Trees spreadsheets, the Hierarchy mapping section works as follows:

  • Specify the top and bottom levels of the hierarchy by picking the first and last columns containing hierarchical levels. All hierarchical columns must be located together and sequentially in the spreadsheet.

  • If a column is mapped in the Hierarchy section, DO NOT map it in the Column Mapping section nor the Unique Identifiers section. These sections are used ONLY for mapping columns that do not specify the column based tree.

  • If the URI Construction Method is label, EDG will assume that the hierarchy columns contain labels of respective resources. If not label then, as mentioned above, a separate column (outside of the hierarchy) needs to contains the label for each resource

  • If the URI Construction Method is label, leave the Unique Identifiers section empty and EDG will use values in the hierarchical columns to generate URIs. Only make mappings in this section to override the Label-based approach and use some other values for the URIs.

  • Carefully examine the Column-based Trees sample layout on the Select Spreadsheet Type screen. It is important that each item in the hierarchy has a row of its own. See below for correct and incorrect options.

Column-based Tree Pattern Examples

Column-based Tree Pattern Examples

It is important to remember that, as with all spreadsheet import options, all resources will be imported as members of the same class as selected as the start of the import mapping definition. An import cannot support Level 3 representing countries and Level 2 representing continents.

Path with Fixed-length Segments Pattern

This pattern requires a path column, where values are such that removing a string of a fixed length from a value identifies a parent for a resource on that row. For example, if using 2 character segments and “Australia” has a path column value of “010201”, its parent would be on a row with a path column value “0102” and its parent’s parent would be on a row with path column value “01”.

EDG finds a parent by removing the exact number of characters specified in the segment length from the child’s path column value. The top most items could have a path column value that is different from the segment length e.g., 1 instead of 01.

For Path with fixed-length Segments spreadsheets, the Hierarchy mapping section works as follows:

  • Specify the column with the path values.

  • Specify the length of the segments to use to calculate the parent row.

  • In the Column Mapping section, to generate a name for imported resources, make sure to assign some column as the preferred label (in case of Taxonomies) or as the label (for all other asset collections). Otherwise, labels will not be generated.

  • If the URI Construction Method is label, map the same column again in the Unique Identifiers section. If not specified, EDG will use row numbers to generate URIs. Alternatively, use other column(s) to generate URIs.

Self-join Pattern

For spreadsheets following the Self-Join pattern, the Hierarchy mapping section works as follows:

  • Specify the Column containing the parent ids – this column will not necessarily be used to generate URI, it is simply a way to match children and parents

  • Specify the Column containing the child ids – this column will not necessarily be used to generate URI, it is simply a way to match children and parents

  • In the Column Mapping section, to generate a name for imported resources, make sure to assign some column as the preferred label (in case of Taxonomies) or as the label (for all other asset collections). Otherwise, labels will not be generated. Typically, but not necessarily, this will be the column you used as a Column containing the child ids.

  • If the URI Construction Method is label, map the same column again in the Unique Identifiers section. If not specified, EDG will use row numbers to generate URIs. Alternatively, use other column(s) to generate URIs.

Import Data Set from Spreadsheet

Import Data Set from Spreadsheet is available under the Import tab and is available only for Data Assets collections.

It reads the input spreadsheet and creates an EDG Spreadsheets Workbook instance, and a Spreadsheet DataSet instance that is part of the workbook and with related DataSet Element instances for each spreadsheet column. The import also includes data profiling for each imported column as shown in the following figure.

TopBraid EDG Data Profile Page

TopBraid EDG Data Profile Page

TopBraid EDG Frequencies Page

TopBraid EDG Frequencies Page

Import Crosswalk from Spreadsheet

Import Crosswalk from Spreadsheet is available under the Import tab and is available only for Crosswalk asset collections.

The input spreadsheet must contain two columns:

  • the first column must contain the primary key used to build URIs of resources in the From asset collection;

  • the second column must contain primary key used to build URIs of resources in the To asset collection.

Import Property Definitions (Schema) from a Spreadsheet

A spreadsheet can be used to to create property definitions for a class in an Ontologies collection. For information on how to do this, see the guide on Creating Property Shapes from Spreadsheet Columns.