Many of our customers use TopBraid EDG to capture information about their data and computer technology ecosystem. This includes data sources, business applications, stakeholders and processes they support. To make this possible TopQuadrant developed a suite of interconnected models – TopBraid EDG ontologies. In this blog, we will take a look at them.
There are over 20 pre-built ontologies. Collectively, they contain hundreds of classes and an even greater number of properties. Classes and properties are described using SHACL – W3C standard for describing Knowledge Graph schemas. This is an overview to get you oriented.
TopBraid EDG ontology models are ready for use, out of the box. They include:
- “EDG Schema – Base” and “EDG Schema – Core” – these ontologies contain classes common to all collection types.
- Ontologies corresponding to each asset collection type e.g., “EDG Schema – Data Assets” – this ontology is used to more specifically describe different kinds of data assets that may be captured in the Data Asset collection or data catalogs/data dictionaries in EDG.
You can find files containing these ontologies in your workspace by selecting Files from the EDG’s Collections menu and then navigating to the edg.topbraidlive.org/1.0/schema folder. If you want to extend or modify any of these models, avoid changing the files directly. At the end of this blog, we will describe a recommended process for modifying EDG ontology models.
Before we talk about what is contained in each ontology, we describe two key types of classes used by the EDG ontologies. EDG models contain two key types of classes: Asset classes and Aspect classes:
- Asset classes are easy to understand – they define all the asset types e.g., Database, Dataset, Report, Software System and so on. These are concrete things you will find easy to relate to. Individual assets are created in EDG as instances of one of the asset classes. Asset classes are organized hierarchically using subclass relationships between more specific and mode general asset types e.g., Security Policy is a subclass of Policy. The root of the hierarchy is the class Asset. Asset classes can also have subclass relationships with aspect classes, inheriting aspects (property groups) as “traits” or “features”.
- Aspect classes are more of an abstract notion – an aspect class comprises all properties describing only a certain aspect of an asset, not the asset itself, e.g. asset documentation. These classes have names like Narratable, Identifiable, Processable, etc. They are used to organize different qualities (aspects or features) of assets. Aspect classes should have no direct instances. All aspect classes are subclasses of the class Aspect/Feature. Aspect classes do not have hierarchical relationships with each other. Using aspect classes makes it easier to manage and maintain large and complex ontologies.
This will become clearer if we look at the example below.
Asset is a class of type Asset Class. It is a subclass of the Status Aspect, Narratable and Identifiable. This means that all assets are:
- “Statusable” – can have a status and associated dates
- Narratable – can have description, purpose, etc.
- Identifiable – can have acronyms, labels, identifiers, etc.
Subclasses of Asset are more specific types of assets – as shown below:
Each of the subclasses of Asset may have its own set of properties in addition to the properties they inherit from parent classes. For example, let’s take a look at the class Requirement.
Since this class is a subclass of the Asset, a requirement would have properties defined for the Asset class. Additionally, Requirement class defines a number of properties that are specific to the requirements. It also has a number of subclasses – for different types of requirements.
Here is another view of the Requirement class – showing other things a requirement may be connected to e.g., use cases.
As you would guess, if we were to look at other classes, we would see that many have a connection to requirements because most assets we are keeping information about have some associated requirements. For example, as shown below, Enterprise Asset is a subclass of the aspect class Traceable. Traceable aspect comprises relationships that describe traceability of an asset.
Traceable are all things that can be connected to requirements and mapped to the business glossary terms. Enterprise assets are things like forms, reports and processes. Technical assets (software and infrastructure) and data assets (databases, datasets, etc.) are also traceable.
If a group of. properties is reusable across assets that are not subclasses of each other (e.g., enterprise, technical and data assets are not subclasses of each other), EDG ontologies define an aspect class to hold or comprise this group of properties. In this example, the aspect class Traceable. As we already mentioned, aspect classes make it easier to organize and maintain large and complex ontologies.
All aspect classes have the type Aspect Class. Asset Class and Aspect Class are used to distinguish between the two different types of classes in EDG ontologies. They are subclasses of both, RDFS Class and SHACL Node Shape. RDFS and SHACL are W3C ontology modeling languages. RDFS is a very minimal language. SHACL extends its to offer much richer ways to describe models.
Let’s come back to the immediate subclasses of Asset. With one exception, each of these classes corresponds to a type of an Asset Collection. In other words, each class is the main entity for a certain category of asset collections. Going from left to right in the first diagram, these classes are:
Requirement – the main entity for the Requirements asset collections. These are catalogs of requirements. There can be multiple Requirements asset collections. The separation can be along the subject areas or along the type of requirements.
Technical Asset – this can be a software asset or a hardware asset. Software assets further partition into a number of subclasses such as applications, systems, software modules and software executables. It is the main entity for the Technical Assets collections. These are catalogs of technical assets. There can be multiple Technical Asset asset collections. The separation can be along the types of assets (e.g., hardware versus software) and/or along the subject areas.
Governance Asset – assets used to describe organization’s governance framework. Governance Assets are further partitioned into subclasses such as Metric, Policy and Governance Process. It is the main entity for the Governance Model asset collection. This is a special asset collection in TopBraid EDG in that there is only one Governance Model collection for a given installation of TopBraid EDG.
Glossary Term – the parent class for terms that are specialized as Business Term, Industry Term or Technical Term. It is the main entity for the Glossary asset collections. There can be multiple glossaries.
Enterprise Asset – includes such things as Business Activities, Business Functions, Business Capabilities, Job Roles, Organizations, Parties, and Information Assets. It is the main entity for the Enterprise Assets asset collections. There can be multiple collections of this type.
Datatype – the parent class for all datatypes: scalars, enumerated values including scales, and structured types. It is the main entity for the Datatypes asset collections. There can be multiple collections of this type.
Data Asset – any data item that is of value to the enterprise. It can be a Database, a Dataset, a Data Element and many other different types of data assets. EDG includes classes for each of these different types of data assets. Data Asset is the main entity for the Data Assets asset collections. There can be multiple collections of this type. Some Data Assets collections can be dataset catalogs, others could be holding information about logical models and so on.
Big Data Asset – the parent class for things such as Big Data Data Assets, Big Data Configuration Assets, Big Data Jobs and Big Data Files. It is the main entity for the Big Data Assets asset collections. There can be multiple collections of this type.
Lineage Model – establishes a context for determining how enterprise capabilities, business functions and information assets are dependent on data flows, applications, software executables and data transformations across data sources and sinks. It is the main entity for the Lineage Model asset collections. There can be multiple collections of this type.
When you create a new asset collection such as Glossary or Data Asset collection, an ontology model for the type of collection is automatically included. You don’t need to do anything. You can just start using EDG as-is. However, sometimes, you may want to modify the models or add to them.
EDG models are open for extension and modification. You can look at the underlying models when you are in let’s say a Glossary or a Data Asset collection, but will not be able to make modifications to them since graphs of this type contain business terms and data assets respectively – ontologies are simply included in them by reference.
To modify one of EDG ontologies:
1. Create a new Ontology asset collection in EDG
2. Include in it the EDG model you want to extend or modify.
You see these models listed below in the Edit Includes dialog in EDG. Their name always starts with “EDG SCHEMA”. The rest of the name is self explanatory since it corresponds to the type of collection. You can filter the list of graphs available for inclusion by typing “EDG” or “EDG Schema” in the Refine box of the Edit Includes dialog – as shown below.
“Other” in the Collection Type column means that this graph is a file in the EDG workspace and not an asset collection itself.
After creating an ontology in EDG that will contain customizations of EDG models, you will be able to make changes in it such as:
- Create new classes as subclasses of EDG classes – these can be assets or aspects
- Add new property definitions
- Deactivate the pre-built property definitions – this may be necessary if you do not want to use some of the pre-defined properties
- “Hide” classes you don’t want to use
- Use SHACL node shapes to build role-specific views for some of the classes
- And more …
For information on how to do this, see TopBraid EDG User Guide. Also useful is the overview video on working with ontologies.
Your last step is to tell EDG that now asset collections of a certain type need to be based on your modification to the pre-built EDG ontology. This can be done in the Server Administration console by going to the EDG Configuration Parameters page. For example, if you created extensions to EDG-SCHEMA Glossary ontology, you can tell EDG to automatically include your modified model when a new Glossary is created. Alternatively, you can go to a specific glossary and use the Edit Includes dialog to include your extensions just in it.