Importing Data into an Asset Collection

TopBraid EDG lets users load metadata and data from external sources such as spreadsheets, RDBMs, DDL files, JSON, RDF files, XML, RSS feeds, SPARQL endpoints and other formats – with flexible mapping to EDG models. Pre-built import options are available under the Import tab for an asset collection or a workflow (working) copy of an asset collection. The Import tab is visible only for users with at least Editor privilege. The available importers vary depending on the collection type. Users may not see exactly the same importers because:

  • Additional importers may be configured by an organization using Import modules that are part of TopBraid EDG’s platform. In that case, the importer is listed as one of the available options alongside the pre-built importers.

  • Some or all of the pre-built importers may also be deactivated by an organization. Deactivation can be done a per collection basis using Configure Features under the Manage tab. In that case, fewer importers will be visible than described here.

In addition to being available under the Import tab, all importers can be executed as services. Dynamic imports from Linked Data sources, including the refreshing of previously loaded values, are executed using options available under the Transform tab. Importers described here are effectively “pull” operations where EDG is pulling data through various mechanisms and connectors.

tip_icon Using GraphQL and SPARQL endpoints to push data into EDG

EDG is an open solution and a powerful capability is for external systems to push data to EDG. This can be done by updating EDG through its GraphQL and SPARQL Endpoints.

Importing Spreadsheets

Import Spreadsheet using Template

Import Spreadsheet using Template is available under the Import tab for all collection types except Crosswalks, Content Tagsets and Corpora.

This importer lets the user select a spreadsheet and a “template” that will be used to convert and store the spreadsheet data. The “template” is created using the mapping process explained in the Import Spreadsheet using Pattern importer. The imported spreadsheet must have exactly the same structure as the spreadsheet used to develop the template. The names and order of the columns must be exactly the same. If multiple worksheets are used, the order and structure of each worksheet (even for worksheets that are not imported) must be the same.

The mapping can also be created or edited using TopBraid Composer if the mapping capabilities are insufficient and a more complex transformations is required (e.g. concatenation of values). TopBraid Composer’s SPINMAP tool provides a drag-and-drop interface that makes it especially easy to create more complex mappings.

Templates developed with TopBraid Composer must be stored in files with “.tablemap.” in their name (for example, myMapping.tablemap.ttl) and be uploaded to the EDG server to be available to EDG users.

Import Spreadsheet using Pattern

Import Spreadsheet using Pattern is available under the Import tab for all collection types except Crosswalks, Content Tagsets and Corpora.

To use this importer, the worksheet must have a header row with the names of columns. EDG will find the first row with data and will assume that it is the header column.

The Import > Import Spreadsheet Using Pattern link shows the following screen:

TopBraid EDG Import Spreadsheet using Pattern

TopBraid EDG Import Spreadsheet using Pattern

Use Browse to select the spreadsheet file to import. Supported file types are an Excel file (.xls or .xlsx), a tab-separated value (.tsv) file, or a comma-separated value (.csv) file. The file should have the expected extension.

Because an Excel file may have more than one sheet of data, there is an option to specify a sheet index value to identify which sheet to import. The default is 1, for the first sheet. The sheet index counts all sheets in an Excel workbook, including hidden ones. For example, if you enter a 3 here and EDG seems to import the second sheet, there may be a hidden one between the first and second sheet that made the third one look like it was the second one when Excel was displaying the workbook. The Excel online help explains how to check for the existence of hidden sheets.

The Entity type for the imported data identifies the class of the assets you will be importing. Each row in the spreadsheet will be imported as an instance of the selected class and the spreadsheet columns can be mapped to the declared properties of the class. Select Next to continue and specify the mapping.

Note

All imported assets are given the same type. To import assets of a different type, either import the same spreadsheet multiple times, with different mappings, or bulk edit the data after import.

Select Spreadsheet Type

The Select Spreadsheet Type view enumerates five possible (column-wise) spreadsheet layout patterns, showing an example of each pattern.

The 1. No Hierarchy layout is the most common and simplest to use. We recommend users become familiar with it first, before using other patterns.

Note

The No Hierarchy import pattern can create relationships in the imported data, including hierarchical relationships. However, referenced asset must exist prior to the import when using with this pattern. In the mapping, the column used to find existing assets to reference (e.g. hierarchical parents) can be indicated. This can be addressed, for example, by importing the same spreadsheet multiple times using different mappings - perhaps once to create assets and then again to add relationships between assets.

For data explicitly structured as a hierarchy, like a taxonomy, there are four layouts from which to select. The main difference between these four layouts and No Hierarchy is that children and parents can be imported and connected in a single import. EDG will create and relate both the children and their parents in the same import, even if they did not exist prior to the import.

In the hierarchical layouts, each row also indicates its hierarchical path, either explicitly (absolute path, #2, #3, #4) or implicitly (recursive path #5); note that lighter text in the layout patterns indicates optional data. These are complex import patterns that behave differently depending on the URI construction method set for the asset collection. Depending on the URI construction method for the collection, they may not be able to be used.

TopBraid EDG Select Spreadsheet Type Page

TopBraid EDG Select Spreadsheet Type Page

Note

Note the header row of column labels in every layout. The imported spreadsheet must always have a header row.

Below the five layout options, the view will show a sample of the spreadsheet’s actual data. The following image shows a spreadsheet of airport codes.

TopBraid EDG Source Spreadsheet Page

TopBraid EDG Source Spreadsheet Page

Select the layout link that most closely corresponds to the structure of the spreadsheet.

Import No Hierarchy Spreadsheet

After selecting 1. No Hierarchy the Import Spreadsheet page appears where the user defines the data-mapping rules from the spreadsheet columns into the target properties of the class.

The page is broken into sections:

  • Column Mappings

  • Default Concept Scheme, if a Taxonomies collections

  • Unique Identifiers

  • various selectable options

  • prompt to enter the mapping template name if saving the pattern for future use

  • Your Source Spreadsheet, data read from the spreadsheet

When importing into a Taxonomies collection and not selecting a concept scheme, EDG creates a new scheme using the name of a spreadsheet. All concepts that do not have a broader parent nor are defined as top concepts of some scheme, will be made into top concepts of the selected or automatically created scheme.

Column Mapping Section

In the Column Mapping section, the user specifies the target property to which each spreadsheet column corresponds. The target properties are taken from target entity type (class or asset type) identified on the previous page. When column names and property names are similar, EDG automatically proposes the mapping. When mapping a relationship, the user can indicate inverse relationships to the target entities.

Unmapped columns and their data are ignored during the import.

The following example shows an example Column Mapping.

TopBraid EDG Import Spreadsheet Page

TopBraid EDG Import Spreadsheet Page

For the mapping to be successful, the datatype of the spreadsheet cell values should match that of the target property into which the column is being mapped. Do not use the string “abc” in a cell value being mapped into an target property with an integer datatype, for example.

In cases where a string-valued target property supports language tags, the user is can optionally set the Language value for the import to set.

If an imported row will result in a new instance, rather than adding data to an existing instance, then one of columns should be mapped to the label property or th preferred label if a Taxonomies collection. Those are used as the EDG display name, for auto-complete, etc.

When importing relationships EDG needs to find the existing asset to which to add a reference. If the asset does not yet exist, EDG may be able to create a URI as the value of the reference.

After selecting a relationship property from the dropdown, the following methods for directing EDG on how to build relationship values are possible:

  1. If the selected class has a designated Setting a Primary Key for a Class, no additional information is needed. EDG uses the value in each row of the mapped column to form the URI of the new instance according to the primary key definition. This option is demonstrated in the above screenshot – airport country is a relationship from airports to countries and the ontology defines a primary key for the class Country.

  • Therefore, the values in the mapped column must the exact value of the property used as the primary key meaning it must be unique for all instances of the class.

  • In the example, the Country primary key values are 2 character country codes and the Country Code column with those values is mapped to the “airport country” relationship.

  • Note that in this case, imported rows will always construct a reference with that URI structure regardless of whether it exists in the collection or not.

  1. If the values in the mapped column are actually valid URLs, then they can be used “as-is” to be the URI of the referenced asset, as indicated by the associated Use values as URIs label.

  2. If neither option 1 nor option 2 apply, the user selects a property of the related class on which to to match. The property will be used to find assets at the end of the relationship and will only create the relationship if the asset already exists in the collection.

  • For example, if Country did not have a defined primary key, the user could map the Country Code column and select “ISO 3166-2 alphabetic country code” as the property to match.

  • Values of the matching property must be unique across all instances of the class. If duplicate values are found, then the related resource will be assigned arbitrarily.

Option 3 is demonstrated in the screenshot below – after removing the primary key from the class Country.

TopBraid EDG Import Spreadsheet - Country Code Mapping

TopBraid EDG Import Spreadsheet - Country Code Mapping

For inverse relationships, the spreadsheet column represents links from instances of some other class to the instances the import creates. Similar to forward relationships, if an inverse relationship is the chosen mapping, then there is a further choice of which referencing-class property to use to identify the referencing instances.

Note

  • As explained under Other Parameters below, if the target of a relationship has the same entity type as the entity type chosen for the import AND Option 3 is being used then the Override existing values option must be unchecked for the relationship to be created.

  • When importing into a Reference Datasets collection, one of spreadsheet columns must map to the primary-key property of the main entity (class) for the dataset. For example, the screen image above identifies this field as the IATA code.

  • When importing into a Taxonomies collection, the user can select a concept scheme to contain the imported concepts. Otherwise, EDG will create a new concept scheme and make all concepts that have no parents in the spreadsheet its top concepts.

Unique Identifiers

This section explains the logic EDG uses to generate URIs for the imported data.

If the import target class has a Setting a Primary Key for a Class, no selections are required . Instead, use the Column Mapping section to map one of the spreadsheet columns to the primary key property. Otherwise, some selections are required.

The available selections depend on the URI Construction Rules configured for the asset collection.

If the URI Construct Method is label, specify the column(s) to be used to generate the URI of each imported row as shown below.

TopBraid EDG Unique Identifiers Based on Labels

TopBraid EDG Unique Identifiers Based on Labels

Typically, the spreadsheet column containing the label value will map to the Id column #1, leaving the rest of mappings empty. However, it is possible to select a different column to generate URIs or even a combination of columns. When multiple columns are selected, their values are concatenated to form the URI.

The Start of URIs option is also available to modify the default namespace of the collection to be used as the basis of the generated URI for the import.

For a successful import adding new information to existing assets, make URI choices that will match the URIs of those assets.

If the URI Construct Method of the asset collection is either the counter or uuid, a different set of options are needed, as shown in the following figure.

TopBraid EDG Unique Identifiers Based on Counter or UUID

TopBraid EDG Unique Identifiers Based on Counter or UUID

If the import is creating new instances, leave the selection empty and EDG will generate URIs according to the currently configured URI Construction Method for the collection, using the default namespace for the URIs.

If the import is adding information to existing assets:

  • use a spreadsheet containing a column with values that are the URIs of existing assets and select that column as the URI column;

  • match on a property to find the existing assets. Values of this property must be unique for the entity type. If duplicate values are found, assignments will happen arbitrarily.

Other Parameters

Other parameters are located directly below the Unique Identifiers section.

Selecting Overwrite existing values will delete an existing value for a mapped property before adding its new (different) value; otherwise, new values will be added to existing ones.

If the imported rows are adding new data values to existing instances and/or adding new instances, it is best to make sure that the Override existing values option is unchecked. Checking this option has the following consequences:

  • If an instance already exists and has a value for any of the mapped columns, the value will be replaced with the spreadsheet data.

  • Relationships between instances of the same type that rely on matching of values will not be created (because these values may be overridden as part of the processing).

  • When working with a Taxonomies collection, a combination of checked Override existing values and the No Hierarchy pattern will always make imported instances top concepts of a new Concept Scheme, even if they already exist in the Taxonomy and have parent concepts.

Selecting Record each new triple in change history (warning: not recommended for large files) prevents EDG from recording the addition of each new triple in the change history.

Note

This option will not let you build relationships to assets that are members of the same class as the one you are importing – because these values can be replaced as part of the import.

A Preview button on the Import Spreadsheet form shows the RDF triples that would be generated with the currently configured settings. The browser’s Back button returns to the form.

Make this a reusable mapping template is optional and saves all of the settings on this form for later reuse. Reusable mappings are selectable using Import Spreadsheet using Template on the Import tab instead of Import Spreadsheet using Pattern. When used, a drop-down list of the saved template names appears for selection.

When satisfied with the sample data shown on the preview, click the Finish button. EDG will start the import, running it in the background.

Import Using Hierarchical Spreadsheet Patterns

After selecting one of the hierarchical patterns, there are three sections:

  • Column Mappings

  • Hierarchy

  • Unique Identifiers

and, as on the previous page, for convenience an example of the source spreadsheet data is shown.

There are also URI column selections below the Unique Identifiers section, unless the URI Construct Method is counter or uuid, where no option to specify columns to use as URIs appears. EDG will always generate the URIs following the chosen method.

When the asset collection uses the counter or uuid methods, these importers CANNOT be used to overwrite existing hierarchies – they can only create new assets:

  • If the need is to add information to the previously imported hierarchies, use No Hierarchy import.

  • If the need is to add a child tree consisting of new resources to an existing hierarchy, these importers can be used. However, the top of the new tree will not be connected to an already existing parent, that connection must be added after the import.

Most of the options on the hierarchical import pages are the same as those described for the No Hierarchy pattern. Those are not described here again and this section of the guide focuses on the unique aspects of the hierarchy mapping.

For all the hierarchical patterns, select a Hierarchy Property (e.g. “has broader”) to connect items in the hierarchy.

Note

  • All hierarchical levels will be connected using the same relationship.

  • To create different relationships between levels, use the No Hierarchy pattern.

The Generate in inverse direction checkbox will reverse the direction of how the property specified in Hierarchy property is applied.

When working with Taxonomies collections, there is an option to select an existing Concept Scheme. If not specified, the importer creates a new concept scheme using the name of a spreadsheet. All concepts that do not have hierarchical parents will be made top concepts of the scheme.

Path with Separator Pattern

This pattern works ONLY if the URI Construct Method is label.

For Path with Separator spreadsheets, in which a spreadsheet entry such as “World > Europe > France” indicates the hierarchical structure above the term “France”, the Hierarchy mapping section works as follows:

  • Select a column containing the path and type a separator e.g., “>”.

  • Identify the Column containing the last node of each path string

  • In the Column Mapping section, to generate a name for imported resources, make sure to assign some column as the preferred label (in case of Taxonomies) or as the label (for all other asset collections) – to generate labels. This will typically be the same column as the one you selected in the Hierarchy section as containing a last node of the path.

  • Map this column again in the Unique Identifiers section. If not specified, EDG will use row numbers to generate URIs. Alternatively, use other column(s) to generate URIs.

Column-based Tree Pattern

When using this pattern and when URI Construct Method is counter or uuid, the spreadsheet needs a single column containing the label for each asset. When URI Construct Method is label, there is not need for such a column, EDG will assume that the hierarchy columns contain the labels.

For Column-based Trees spreadsheets, the Hierarchy mapping section works as follows:

  • Specify the top and bottom levels of the hierarchy by picking the first and last columns containing hierarchical levels. All hierarchical columns must be located together and sequentially in the spreadsheet.

  • If a column is mapped in the Hierarchy section, DO NOT map it in the Column Mapping section nor the Unique Identifiers section. These sections are used ONLY for mapping columns that do not specify the column based tree.

  • If the URI Construction Method is label, EDG will assume that the hierarchy columns contain labels of respective resources. If not label then, as mentioned above, a separate column (outside of the hierarchy) needs to contains the label for each resource

  • If the URI Construction Method is label, leave the Unique Identifiers section empty and EDG will use values in the hierarchical columns to generate URIs. Only make mappings in this section to override the Label-based approach and use some other values for the URIs.

  • Carefully examine the Column-based Trees sample layout on the Select Spreadsheet Type screen. It is important that each item in the hierarchy has a row of its own. See below for correct and incorrect options.

Column-based Tree Pattern Examples

Column-based Tree Pattern Examples

It is important to remember that, as with all spreadsheet import options, all resources will be imported as members of the same class as selected as the start of the import mapping definition. An import cannot support Level 3 representing countries and Level 2 representing continents.

Path with Fixed-length Segments Pattern

This pattern requires a path column, where values are such that removing a string of a fixed length from a value identifies a parent for a resource on that row. For example, if using 2 character segments and “Australia” has a path column value of “010201”, its parent would be on a row with a path column value “0102” and its parent’s parent would be on a row with path column value “01”.

EDG finds a parent by removing the exact number of characters specified in the segment length from the child’s path column value. The top most items could have a path column value that is different from the segment length e.g., 1 instead of 01.

For Path with fixed-length Segments spreadsheets, the Hierarchy mapping section works as follows:

  • Specify the column with the path values.

  • Specify the length of the segments to use to calculate the parent row.

  • In the Column Mapping section, to generate a name for imported resources, make sure to assign some column as the preferred label (in case of Taxonomies) or as the label (for all other asset collections). Otherwise, labels will not be generated.

  • If the URI Construction Method is label, map the same column again in the Unique Identifiers section. If not specified, EDG will use row numbers to generate URIs. Alternatively, use other column(s) to generate URIs.

Self-join Pattern

For spreadsheets following the Self-Join pattern, the Hierarchy mapping section works as follows:

  • Specify the Column containing the parent ids – this column will not necessarily be used to generate URI, it is simply a way to match children and parents

  • Specify the Column containing the child ids – this column will not necessarily be used to generate URI, it is simply a way to match children and parents

  • In the Column Mapping section, to generate a name for imported resources, make sure to assign some column as the preferred label (in case of Taxonomies) or as the label (for all other asset collections). Otherwise, labels will not be generated. Typically, but not necessarily, this will be the column you used as a Column containing the child ids.

  • If the URI Construction Method is label, map the same column again in the Unique Identifiers section. If not specified, EDG will use row numbers to generate URIs. Alternatively, use other column(s) to generate URIs.

Import Data Set from Spreadsheet

Import Data Set from Spreadsheet is available under the Import tab and is available only for Data Assets collections.

It reads the input spreadsheet and creates an EDG Spreadsheets Workbook instance, and a Spreadsheet DataSet instance that is part of the workbook and with related DataSet Element instances for each spreadsheet column. The import also includes data profiling for each imported column as shown in the following figure.

TopBraid EDG Data Profile Page

TopBraid EDG Data Profile Page

TopBraid EDG Frequencies Page

TopBraid EDG Frequencies Page

Import Crosswalk from Spreadsheet

Import Crosswalk from Spreadsheet is available under the Import tab and is available only for Crosswalk asset collections.

The input spreadsheet must contain two columns:

  • the first column must contain the primary key used to build URIs of resources in the From asset collection;

  • the second column must contain primary key used to build URIs of resources in the To asset collection.

Import Property Definitions (Schema) from a Spreadsheet

A spreadsheet can be used to to create property definitions for a class in an Ontologies collection. For information on how to do this, see the guide on Creating Property Shapes from Spreadsheet Columns.

Import DDL File

Import DDL File is available under the Import tab and is only available for Data Assets and Datatypes. It reads DDL statements (CREATE TABLE, etc.) from an SQL file, and creates corresponding entities in EDG.

When importing into a Data Assets collection, the following entities from the DDL file are created:

  • a Relational Database

  • any Database Tables defined

  • any Database Views defined

  • the Database Columns of the tables and views

  • a Physical Data Model that serves as a container for the entities about the database

Database name: The importer will prefix all entity names with a database name, to distinguish the entities created by importing different databases. If no database name is specified, then the name of the SQL file will be used (e.g., NORTHWIND for northwind.sql). The database name serves a role similar to the Catalog names and Schema names within a database server.

Model for Datatype Definitions: The importer also stores the datatype of each table column. It will re-use existing datatype definitions for previously seen types, and create new ones for the rest. The drop-down tells the importer where to look for datatype definitions and where to import new ones. The options are:

  • Any EDG Datatypes that have been included into the Data Asset (via General > Includes).

  • The Data Asset itself.

To store imported datatype definitions, we recommend using EDG Datatypes rather than storing them in the Data Assets themselves.

SQL Compatibility

The DDL import functionality supports MySQL, Oracle, PostgreSQL, SQLServer, Hana, Snowflake, Teradata and Hive.

In many imports two forms of problems occur:

  1. SQL statements that cannot be parsed: When an input file cannot be parsed, the import process will be aborted and nothing will be imported. An error message will be shown indicating the location in the file where the parse error occurred. It may be possible to manually edit the SQL file to remove the unsupported SQL features.

  2. SQL statements only partially understood: In some cases, the importer will be able to understand the basic intent of a DDL statement, but not a specific parameter or argument to the statement. In this case, it will continue and import whatever was understood. Therefore, imported data should be carefully reviewed to ascertain that all needed information has been imported.

Customizing the DDL Import

The importing of DDL into Data Assets (import DDL file) and Datatypes (import DDL file) provides an extension point that allows developers to add custom post-processing behaviour to the DDL import. This advanced feature requires a good understanding of Extension Development. In a nutshell, it can be done by overriding the SWP element edg-importer:PostProcessImportedDDL. The arguments provided to the prototype are documented on the SWP element itself.

Import From JDBC Connection

Import From JDBC Connection is available under the Import tab and is only available for Datatypes and Data Asset Collections.

Functionally, import from JDBC has the same purpose as from a import of a DDL File, except that the DDL source is a live connection to a database server, rather than a DDL file.

The parameters for the import are:

Model for Datatype Definitions: If not available, create a new EDG Datatype collection which will automatically be populated with some standard data

JDBC URL: The connection address-string, which may depend on the database type (e.g., jdbc:mysql://localhost:3306/mydb123)

User Name and Password: A database login with access to the desired tables.

Note

If the password is already in EDG’s secure storage, it may be omitted.

Database name: the desired database/schema that identifies the scope of this DDL import operation, . If not provided, the import will use the default scope of the user or connection.

Include data statistics: If checked, this will compute statistics summarizing the data contained in each imported entity (table, view) and column. Edit or view the data asset collection and select each asset item to see the details of its resulting statistics.

Include data samples: If checked, this will collect sample rows from each entity (table, view). Edit or view the data asset collection and select each entity table or view to see the sample data.

Maximum number of data samples per table: When including data samples, the upper limit of rows to collect from each table for the sample data.

Record each new triple in change history: If checked, each imported triple is recorded in change history, which is not recommended for large imports.

Note

This option is not visible if Record Triple Count option under the Manage tab is activated.

Schedule Import: lets the user schedule imports to run on a recurring schedule.

Import RDF File

Import RDF File is available under the Import tab. Any asset collection can import data from an external RDF file. The Import > Import RDF File link shows a screen where the Browse button opens a dialog to select the external RDF file.

TopBraid EDG Import RDF File Page

TopBraid EDG Import RDF File Page

Choose the RDF file and select its Format, noting that the file may be compressed. The compression formats ZIP (.zip), gzip (.ttl.gz etc) and bzip2 (.ttl.bz2 etc) are supported. Only the first file in a ZIP archive will be imported.

Then, if applicable select the following options:

  • Record each new triple in change history (use with care for large RDF files!). If importing into a Working Copy, history will always be recorded and this option is greyed out.

  • Direct streaming import into production copy, available only for users with at least Managers permission. Direct streaming is not available for import into Working Copies.

  • Perform constraint validation only Validate the RDF file content combined with the existing collection data This is necessary because some violations only become apparent for the combined data.

Click Finish to complete the the operation. A message will indicate whether the import was successful. For large imports, this process may take minutes. Please check the status on the Reports tab for File History Report.

Note

  • If an RDF file contains any “schema” definitions such as classes, properties, or shapes, then it can only be imported it into an Ontologies collection.

  • If an RDF file contains both “instances” and “schema”, either split the file before import or follow instructions in `Copy or Move Instances from Other Asset Collection`_.

When importing RDF into a Working Copy, the addition of each triple will be recorded as an entry in the change history, where it will be available to all the relevant reports. When importing into a Production Copy, the Record each new triple in change history checkbox gives you the option of adding these to the change history;

Note

This is not recommended when importing large amounts of data.

The option of Direct streaming import into production copy imports the content much more quickly and uses less memory. This should only be used for large imports if the user is confident they do not need to do validation or clean up on the data. It’s best to perform a backup (e.g. Export RDF File) of the collection prior to importing with direct streaming or use a workflow so that reverting is possible should anything go wrong.

When importing RDF files into an Ontologies or a Taxonomies collection, EDG performs some transformations (unless the streaming import is chosen):

  • For Ontologies, “subclass of Thing” statements will be added for classes that have no parents. This is done to ensure that these classes are visible in the Class Hierarchy.

  • For Taxonomies, “narrower concept” relationships will be used to generate inverse “broader concept” relationships. This is done to ensure that such concepts are visible in the Concept Hierarchy.

Auto-create Property Shapes from a SPARQL Endpoint

Information in the Knowledge Graphs that are external to an EDG installation can be used to create property definitions in an Ontologies collection.

See details at Working with Wikidata and other external Knowledge Graphs.

Import from SharePoint Term store

Import from SharePoint Term store is available under the Import tab and is only available for Taxonomies. Its purpose is to connect directly to a Sharepoint installation and import the stored terms it finds into an EDG taxonomy collection

Once selected from the list of importers, EDG automatically uses the pre-configured Sharepoint connector. The user is then prompted to select a Term Set found in Sharepoint to import, and to then click Continue.

TopBraid EDG Import Sharepoint Term Store Select Set

TopBraid EDG Import Sharepoint Term Store Select Set

Th importer will then import the Term Set and populate an EDG model reflecting the Term Store concepts. The following figure shows the Term Store concepts as mapped into a taxonomy during import.

TopBraid EDG Imported Sharepoint Term Store Concepts

TopBraid EDG Imported Sharepoint Term Store Concepts

If the Term Set was previously imported, the newly imported model will be merged with the existing one. If a new, clean imported set of data is preferred, then delete the Term Set from the collection before re-importing it.

Create Multiple Asset Collections from TriG or Zip File

The Create Multiple Asset Collections from TriG or Zip File button is available after clicking the plus sign in the EDG header to Create new Asset Collection.

TopBraid EDG Create From Trig or Zip

TopBraid EDG Create From Trig or Zip

Clicking the button opens the Import TriG or Zip File page and the user simply clicks Choose File, selects the desired TriG or ZIP file, and then clicks Finish.

TriG files are an RDF-specific file type that contain one or more named graphs. Zip files are general compressed archives, in this case, containing RDF files that are named graphs. The named graphs that follow the EDG naming convention (e.g. urn:x-evn-master:geography_ontology) will be imported as asset collections with the importing user as the manager. Other named graphs will be imported as Turtle files in the EDG workspace. Any pre-existing graphs remain unchanged.

An import is not permitted if either of the following is true:

  • A matching named graph already exists in EDG to which the importing user does not have at least read permission.

  • There are any triples in the default graph of the TriG file.

EDG can create TriG or Zip files for most asset collection types by using Export > Export [Asset Collection Type] with Includes as a File.

TopBraid EDG Export to Trig or Zip

TopBraid EDG Export to Trig or Zip File

EDG administrators can create TriG files using Server Administration > Create TriG file of all EDG production graphs (used for testing to backup or replicate a set of collections).

Import Property Definitions (Schema) from a SPARQL Endpoint

You can use information in the Knowledge Graphs that are external to your EDG installation to create property definitions in an ontology. For information on how to do this, see the guide on Working with Wikidata and other external Knowledge Graphs.

Import Concepts from Documents

Import Concepts from Documents is available only for Taxonomies. It is particularly suitable as a way to seed a new taxonomy.

When the user selects a file to import, EDG will analyze the file extracting concepts from it. Extracted concepts are presented for selection in order of their frequency in the document. The user then selects the concepts to be added to the collection, placed in an EDG-generated scheme to “hold” the new concepts. The user then manually organizes them hierarchically, as desired.

Import Single Document is available only for Corpora allowing users to upload documents into a corpus one document at a time. Import of documents and associated metadata into a Corpora asset collection is performed dynamically according to the connector option selected when creating the corpus.

Imported Data Reports

File Imports Report is available under the Reports tab and shows the history of file imports into the collection. The form will periodically refresh, and the refresh timer can be paused as shown. This also shows any current running file imports.

TopBraid EDG File Import Reports Page

TopBraid EDG File Import Reports Page

The report includes information about the importing user, the success or failure of the import, a summary, the datetime, the process ID and the workflow name, if applicable. Currently executing imports are included in the list.

When importing directly into an asset collection the Record each new triple in change history checkbox is available which sets the option to record each individual change in the change history.

Warning

The default is unchecked and it is not recommended to select this option when importing large amounts of data. When this option is left unchecked, the change history will contain a record capturing that an import was executed, but will not contain every individual addition or deletion of data.

Record Triple Counts can be activated under the Manage tab for an asset collection. If activated, the change history will only record the numbers of added and deleted triples instead of the details about each triple. That significantly reduces the size of the change history. This choice removes the Record each new triple in change history option for the collection during import.

When the import is executed in the context of a workflow, each change is included in the change history, where it will be available to all the relevant reports.

Additionally, the decision not to retain detailed history upon completion of the workflow could be made when designing a workflow template.

Archive Working Copies on Commit can be activated under the Manage tab. If activated, detailed change history for committed working copies will be automatically archived.

Note

This has precedence over the Record Triple Counts only option, i.e. the archive will contain the full history before being compacted.