Database Type: Application data storage

The Database Type parameter is the application storage for your data. When using RDBMS or Data Platform, you will have additional information to configure.

EDG offers three types of persistence technology: (1) relational database, (2) Jena TDB files, or (3) Data Platform.

App data storage type

Description

File Extension

In-memory + RDBMS persistence

Choice of relational DB: Oracle, Microsoft SQL Server, or MySQL, which requires further RDBMS Configuration as noted below.

.sdb

TDB (One database per graph)

Apache Jena TDB, configured for each graph to use its own TDB database. For EDG instances with application storage type of TDB (Each graph stored in its own, separate database), setting ulimit to unlimited is recommended. This will prevent EDG from reaching too many open files on your instance.

.tdb

TDB (Shared graph database)

Apache Jena TDB, configured for all graphs to share a single TDB database, data will be stored in the _Data folder at the root of the workspace.

.xdb

Data Platform

Data Platform as a data store. This will enable all EDG collections to be synced between EDG nodes. See TopBraid Data Platform for specific instructions on setting up Data Platform.

.dpc

Because EDG’s own system graphs also depend on the data storage type, changing the Application data storage should be considered as tantamount to a new installation. The database storage type cannot be changed once selected. If the need arises to change datastores, a new EDG workspace will need to be created and the data imported again.

The TDB options require no additional setup or parameters. RDBMS each have additional required configuration, as described below.

Note

The choice of back-end storage is mainly a customer preference until you are getting into large scale data of over 30 million triples.

  • With the TDB options, the database lives in the workspace file system on the server. You will see the extensions in the connector files in your Base URI Management page as “graph name”.tdb and .xdb.

  • With TDB – one database per graph (gTDB), each graph will have it’s own database.

  • With TDB – shared graph database (xTDB), one database contains all the graphs.

  • With RDBMS you will have the data residing in a database on another server and the connector files in the workspace. This extension will be .sdb.

Either TDB does not use as much memory as a RDBMS option. It also does not load all the data into cache at server startup. You will notice significantly quicker startup times with TDB.

The difference between Oracle, MySQL and SQL Server is minimal as far as EDG is concerned. They have different ways of processing the reads/writes so performance may differ slightly with large amounts of data. If choosing RDBMS, you should choose what your DBA’s are most comfortable maintaining and tuning.

Even organizations that are expecting to have a relatively small number of triples will often choose a TDB option over RDBMS in order to get up and running quicker and have less moving parts.

With any option you choose, it’s important to keep your workspace regularly backed up or use server snapshots.

RDBMS Configuration Parameters

For relational RDBMS parameters, the corresponding database must already exist before a user can use the web-based EDG interface to create a new vocabulary in that database.

Parameter

Description

RDBMS URL

The URL of the relational database. For example, jdbc:oracle:thin:@localhost:1521:delphi, where delphi is the name of the instance, or jdbc:mysql://localhost:3306/myDatabase. The database with that name must already exist on the database server. (In the latter case, the myDatabase database must already exist on the MySQL system.) Common formats for the RDBMS URL include:

  • jdbc:mysql://<server>/<database>

  • jdbc:oracle:thin:@//<server>:<port>/<service>

  • jdbc:oracle:thin:@<server>:<port>:<SID>

  • jdbc:sqlserver://<server>[:<port>][/database][;property=value]

NOTE: for SQL Server: A single backslash “” in the URL string may cause a problem in the vault storage file for the password. Alternatives are (1) to use double-backslashes “\” or (2) store the password using Password Management or (3) replace the backslash “” element by a keyword assignment, e.g., “…;instanceName=myInstance;…” instead of “…myInstance;…”.

Parameter

Description

RDBMS database type

The supported type of relational database being used

RDBMS user name

Service account with appropriate permissions to the database

RDBMS Update Batch Size

OPTIONAL: This is the number of rows written to the SQL database in each batch. If unset, then 1000 is used. Adjusting it might improve bulk insert performance.

RDBMS Update Fetch Size

OPTIONAL: The number of rows returned from the SQL database on each network round trip. Certain values have certain meaning to difference database types. Not all databases use this value. NOTE: Leaving the Batch and Fetch sizes unset should generally yield acceptable loading/caching performance. Each can be fine-tuned for a particular application by adjusting it up or down and observing the performance changes.