Thanks to everyone who attended our first in a series of TopBraid Enterprise Data Governance (EDG) webinars. We demonstrated how a standards-based approach to data governance supports the diversity and rate of change that define today’s complex data ecosystems. The webinar was very well attended with people coming from industries as varied as financial services, healthcare, publishing and manufacturing.

<< Click on image below for larger view >> During the webinar, attendees submitted several interesting questions. We were only able to answer a few of them during the webinar, so we’d like to address all of the questions here.

Questions about Data Governance and TopBraid EDG:

Q1: Data Lineage Tracking: How is the documentation in EDG connected to the programs/modules implementing these calculations, and how are changes tracked in EDG?

Programs, ETL scripts, etc. are described in the Technical Assets model. Lineage models are used when you want to describe how data flows across the enterprise in a more comprehensive way than, for example, saying that one data element is derived from another data element. In defining lineage, you would say that a particular data element or set of elements is a source for another data element or field in a report, and you would also capture what program performs this transformation or calculation. With this, one should think of a lineage as an overlay on top of the data assets and technical assets.

Q2: Do data assets also include documents (PDF, PPT files) so that you know EDG used data in included tables, figures?

Data assets are primarily about structured data sources. They describe technical metadata such as databases, tables, columns, datasets and dataset schemas. They could also describe logical models. Enterprise assets have support for capturing documents and metadata about them. Having said this, you can easily extend the scope of the data asset model. You can create a new entity (class) that should be a part of the data assets and model any properties it should have.

Q3: Can you connect physical assets to glossary terms by batch?

Yes, any relationship can be updated using a batch update.

Q4: Can you assign different stewards (compliance steward, technical steward, etc.) to specific types of metadata?

As we have shown in the webinar, EDG models can be easily modified or extended. There are no pre-built relationships that support saying that a given resource “has a compliance” steward and “has a technical” steward. There is only a “has steward” relationship, but this is a good example of an extension an organization may want to do.

Q5: What ontologies come with the product out of the box?

The only ontologies that are required are those that describe the different assets managed by EDG.

We also ship a few examples, such as FIBO, so that users have something to experiment with. If there is any ontology you would like to include, doing this is very easy with EDG. You simply go to the Ontologies page in the Navigator and click on “Create New.” Then, after creating it, you can load/import the ontology file into it.

Q6: Is the glossary built on SKOS?

Many of the SKOS properties are included in the definition of a glossary term, but there are also other properties that are not part of SKOS.

Q7: How easily can third-party reference data tools integrate (specifically Informatica MDM)?

If reference data is not held in TopBraid EDG, you have a number of options.

If you have a reference data management system and that system offers APIs, then you can connect through that. If there are no APIs, then you could simply say that “permissible values” are held by such-and-such system and add some text description to explain what they may be or how to look them up.

A similar approach would work in a case where there is no reference data management system per se and reference data is managed on an ad hoc, distributed basis.

Q8: How does the tool work with data modeling tools? Are the connections from the models to the physical assets automatic?

Yes; if connections exist in the modeling tool, they can be automatically imported. For example, say one customer imports spreadsheets produced by Erwin. They prefer to go this route as opposed to importing DDLs or trying to establish a real-time connection to databases because in Erwin they already have connections between the logical and physical models.

Q9: How can these tools be used collaboratively?

EDG is a server solution with role-based access control. The collaborative feature includes comments, tasks, notifications and working copies.

Q10: Do they keep audit trails?

Yes; everything is audit trailed.

Q11: Are there mechanisms for publishing assets (e.g., glossaries) to make them publicly available as web pages, etc.?

Yes.

Q12: Where can we go to get more information on the capabilities and configurability of the API access to reference data (e.g., can the user define a RESTful interface?)?

Yes, the user can define RESTful interfaces. For more information, see this blog post on web services and TopQuadrant products and this blog post on creating web services with the TopBraid platform .

Q13: Will you elaborate more on ontologies? Conceptual data models are not ontologies. Glossaries are best derived from ontologies. Can I import FIBO?

You can import any ontology, including FIBO. We pre-package FIBO as an example. However, this is just an example and for convenience purposes. Our release cycle is different from the FIBO release cycle, so you could always get the most up-to-date version from the EDM Council.

In our experience, there are many more organizations that have business glossaries in some format (often spreadsheets) than organizations that have ontologies. EDG lets its users import a business glossary from a spreadsheet because it is the most commonly used starting point.

If you do have ontologies and prefer to derive business glossaries from them, you can do so. Or you can keep them as ontologies. Users can view and navigate in EDG and access them through APIs just as easily as they can do so with glossaries.

Q14: Which triplestore you use for EDG?

The product ships with two built-in triplestore options: a TQ-enhanced version of Jena SDB that uses an RDBMS of your choice as a triplestore, and Apache Jena TDB. Another option that requires a separate license is a MarkLogic triplestore.

Q15: How could provenance tracking of data could be supported by TopQuadrant?

There are a number of pre-build properties that can be used to track provenance. These depend on the nature of the asset for which provenance is being tracked. In the case of data (e.g., data elements, datasets, etc.), “derived from” may be useful. But, if more complete provenance is required, then would recommend using lineage models as demonstrated in this video.

In the case of other assets such as glossary terms, there are other properties that are useful in the tracking of provenance such as “source,” “originated by,” “influenced by,” etc.

And, of course, you can always extend and modify the models to ensure that the properties important to you are present.

Q16: How can different vocabularies and ontologies be accessed like a lookup service?

There is a global lookup service that works across all assets governed by TopBraid EDG.

Q17: What if you are trying to govern a data/data type across different databases (Oracle and SQL)?

This is not an issue. You can select any of the data types included with EDG and/or any data type that is used in your databases. In fact, we can populate your custom data types directly from a database’s DDL.

Q18: Do you have support for NIEM? What does that support consist of?

Using TopBraid’s transforms for converting XSD Schemas to RDF/OWL, TopQuadrant has converted NIEM XSD Schemas to ontologies. These ontologies can be imported into and used with EDG. NIEM enumerations are expressed in EDG as codelists.

Q19: Much of the knowledge we need is about requirements and resides with functional people who are not motivated by governance. We need to show them the benefits ASAP. Do you have materials telling us how to do that?

Governance is usually a means to an end. The end may be regulatory compliance or operational effectiveness, etc. Making a compelling case for these is highly dependent on the specifics of an industry sector and requires deep expertise in each particular sector.

We are building out some industry-specific demos and resources, but they may not yet target your sector or your line of business. We think that in general, companies specializing in management/strategy consulting and system integration are best positioned to credibly produce materials on how data governance delivers various results business people need to achieve.

Q20: Do you support description of service and application interfaces, not just databases?

Yes; this is available in the Technical Asset models.

Q21: I know that TopBraid offers some (reasoner-base?) help in generating mappings between schemas.

There is a SPINMap capability that supports creating executable mappings between schemas. For more information see Reshaping Relational Data using SPINMap.

Q22: Can calling interfaces can also be modeled?

Absolutely.

Q23: We will have a high-volume of small transactions (many tps) and some bulk ETL, both likely to require performance enhancers like parallelism and load balancing. Reasoners are too slow. Do you offer them, perhaps by exporting mapping descriptions to partners?

Generally speaking, the TopBraid platform offers technologies and tooling to quickly create any kind of a textual output. Such output may be a script that is executed by another system.

Q24: What if the data to be governed is the executable code in the system?

TopBraid EDG offers Technical Asset Models to support this requirement.

Q25: What rate of change of data values can be handled? What rate of appearance of new data entities can be handled?

Typically, metadata, reference data, business glossaries and even things like policies or business applications do not change very rapidly. At least not compared to the operational, transactional data. Having said this, there are no built-in limitations with respect to the rate of change. It seems that most impacted would be the audit trail. Users can always archive some of the history if this becomes an issue.

In addition to the questions during the webinar session, there are a couple of interesting observations made that we included below:

Extend DIFFERS from customize. It may be interesting for a poll question to ask to rank extend, customize, elaborate, etc.

True, it may be interesting to tease out and define different categories of ways in which users can change EDG to better suit their unique requirements.

A system has progress properties and safety properties. It looks like EDG now enables awareness of safety properties for any given progress trajectory. Nice!

Yes, exactly. Thanks.

If you have more questions or would like to arrange a demo of TopBraid EDG, let us know at edg_demo@topquadrant.com.