We fielded several questions as part of our recent webinar, Metadata Management is Key to Data Governance Initiatives. The webinar focused how TopBraid Enterprise Data Governance (EDG) lets you manage metadata across all your data sources, irrespective of their location, structure and format. |
Questions about Metadata Management
Q1: What is the best way to learn ontology modeling?
TQ offers an excellent training class on W3C standards and ontology modeling: Click here to learn more
There are several other companies that also offer training. It is important to look for the training provider that keeps up to date with the ontology standards development. TQ does this by being a member of W3C and participating in its working groups.
You can also find online tutorials, books and training materials. One useful book is Semantic Web for the Working Ontologist by Dean Allemang.
Q2: Mapping to Glossary term: Is there an option to create or import a glossary for the terms used in a project? In other words, how did you get the relevant glossary of terms?
We have imported it from a spreadsheet. Most terms came from the SEC Glossary which we converted to text and then put into a spreadsheet.
In general, the simplest way to import is from a spreadsheet. There is no pre-set spreadsheet structure for you to use since you can simply map any column in the spreadsheet to any available field.
Q3: Can EDG be configured to work across corporate boundaries? E.x., multiple companies are working on a shared system that integrates their business processes.
Yes, certainly. It is a web based system, so users across different organizations can have logins. TopBraid EDG integrates with LDAP for single sign-on. When used across organizations, you probably don’t have a shared LDAP system, so you can create users directly in EDG and give them appropriate privileges.
Q4: Is your interface capable of being displayed in French?
TopBraid EDG supports multiple languages by letting you tag text fields with language labels. This includes names of all assets and associated properties. Since much of the UI is driven of the metadata – names of the assets, names of the asset collections, information about them, even actions that are available for each collection (e.g., imports) are all model-driven, they can be displayed in any language. This is one of the many advantages of the model-driven UI. In fact, EDG will take the language preferences set in your browser.
Below is a screenshot that shows how the info in the demo would have looked like to a user with the browser preference set to French. You will see that a few items on the screen are still in English because they are defined in the application code as opposed to the models – like “Home” and “Go to”, but 90+% of what users will see will be in the national language because it is model and content driven. (“User” is actually a part of the model, but we didn’t create a French label for it).
We also offer text classification capabilities in French and several other languages. Some of our customers use TopBraid EDG to manage taxonomies for text classification and we can do this for French documents. And we have French speaking personnel and an implementation partner.
Q5: Can we build custom workflows?
A default workflow involves creating a working copy for an asset collection and making changes there. You can have multiple working copies running at the same time with multiple people collaborating in the working copies. The working copy with changes is then reviewed and approved by the responsible personnel. Notifications are sent when events happen – such as a status of a working copy is changed or a new task is created. You can define who gets notified for each type of event.
In addition (or instead of) storing tasks in EDG, you can use out of the box integration with JIRA and use JIRA workflow capabilities.
You can also build custom workflows using a third party workflow editor/engine, integrating it with EDG through web services and events. This lets you built workflows that are not self-contained in EDG, but span multiple systems. For example, a change in reference data may start and be approved in EDG, but the workflow doesn’t complete until it is implemented in a database(s) that use reference data or a change to the database schema may start outside of EDG, but not considered to be approved until it is reviewed by the data stewards in EDG. Since most of the actions users can make in TopBraid are available as services and EDG events can trigger any action (not just sending of an e-mail, but posting a JMS message or calling a service), you can easily integrate TopBraid EDG with any modern workflow tool. We’ve have customers that use Tibco (or similar) to support such workflow.
Q6: Do you support the concept of a “Dataset” (as opposed to “data element”)? If “yes”, how?
Database columns are a special sub-class of data elements – data elements that are organized into database tables. You can have other types of data elements as well. Just like tables contain data elements which are columns, datasets also contain data elements.
There is also a concept of a “dataset schema” which defines specific data elements that belong to a dataset. If you want to describe datasets that share a common schema, you can use that concept.
Finally, you can define your own asset types for anything that is not already covered with the pre-built models. In the demo you have seen us create a new field for already existing class. Similarly, we could have created a brand new class and defined its fields.
Q7: How do you link data/information to business processes? do you have business processes as assets?
Yes, business processes are assets in the “Enterprise Assets” collections
Q8: Where it fits in, could you give a brief description on “Governance Assets” and “Crosswalks”?
“Governance Assets” contain information about policies, KPIs and other relevant metrics.
“Crosswalks” are typically used to map two Reference Datasets. You can use them to define many-to-many connections between codes. There are pre-built services that will translate data coded with one set of codes to data coded with another set of codes. This is the most common use case, but you can use them for other purposes as well.
Q9: Is EDG’s central mapping schema based on any ISO or other metadata standards beyond W3C?
The short answer is “yes”. In case you are interested in more details, we have also provided a longer answer.
W3C standards are used as a language (knowledge representation) for describing metadata. They don’t impose much of the metadata themselves except for a few pre-built properties like a label or a type. The underlying standard is RDF – a graph-based data model for making connected statements. You can express anything in RDF. Other W3C standards add on top of it a way to build ontologies (e.g., to say that something is a class and that it has certain properties), to define rules and to query data.
TopBraid EDG’s pre-built ontologies incorporate concepts from many enterprise architecture standards such as DoDAF and TOGAF. ISO/IEC 11179/Part 3 is also incorporated with attributes like name, title, permissible value, etc. ISO/IEC 11179/Part 3 describes the meta-model of the registry. Originally, it was developed with the assumption that the registry is stored in the relational database – basically, providing a pre-set schema that could be implemented in a relational database. The updated version released in 2015 has been influenced by the W3C standards (RDF, OWL and SKOS) and reflects them.
Much of ISO/IEC 11179 is focused on either a concept of registration, on the naming conventions (part 5) or on recommending best practices for the textual descriptions (part 4). For example, part 5 describes how data elements should be named with options like: Cost Budget Period Total Amount or CostBudgetPeriodTotalAmount or Cost_Budget-Period_Total_Amount. The main guidance is that the name of the object goes first, then the qualifier, then the name of the element. It is up to a user to decide about separators. Many use periods as separators.
TopBraid EDG can enforce any convention once you have decided to use it. In our own examples for the demo, we have not followed this typical ISO 11179 convention and, instead, reversed the column name as follows: “column name (database name.table name)”. This was a choice made for usability. We tried first with the order specified in ISO and found it less user friendly for sorting – if users want to see all columns of a particular table, they can easily do so by clicking on the table itself or by setting the table = Cost as criteria in the search form. But if they want to see all “like” columns sorted together, it is not as easily done with the column name being at the end. However, it is a matter of preference and the decision on the naming conventions is entirely up to the users of the tool. TopBraid EDG will support the decision.
Q10: What are some of the importers? the ones you can natively connect and import?
TopBraid EDG comes with the general “building blocks” for imports from common formats such as spreadsheets, XML, RDBMs, etc. And we offer powerful mapping and transformation capabilities. This way, users have flexibility to support import of any information from any tool.
Most of the work for building a tool-specific importer is about deciding what data to put where. In other words, mapping the meta-models. Some of this can be defined in a general way, but not everything. Organizations may extend TopBraid EDG models to create their own asset types and their own fields and/or disable some of the fields. Organizations may also customize the source system to capture some information that is not available in it out of the box and/or have used some of the “out of the box” fields in their own custom way. For this reason, certain amount of custom mappings and transformations is likely to be always needed. Our focus is on making this easy to do.
We find that the generic spreadsheet importers and database connections cover most of the situations because majority of tools let you export their content as a tabular spreadsheet or a spreadsheet-like text. TopBraid EDG does not mandate one specific format for such spreadsheets. They can have any structure. This is a simple, yet powerful capability.
Spreadsheets can be flat tables or express hierarchies. Data in them can be text, numbers or other literals or it can represent relationships between items in the cells. Users map columns in the spreadsheet to fields defined in EDG. Once you have done the mapping, you can save it as a “template”, so that you can import the spreadsheet with the same structure again without repeating the mapping. Using this capability our customers have successfully imported SAP data dictionaries, Erwin models and many other sources without TopQuadrant’s involvement.
Further, TopBraid EDG also comes with an IDE (called TopBraid Composer) that lets you assemble pre-built import modules into custom import scripts. This can be done using visual drag and drop UI or a more traditional text-based scripting language. See, for example SPARQLMotion. Such scripts can include any transformations.
Every script is available as a RESTful web service and/or can be declaratively included in the UI. TopBraid EDG UI is model driven not only in terms of what assets are available, but also in terms of what imports, exports or reports are available. In that sense, adding another import action is similar to adding another field. Because importers are web services, they can be scheduled to run or triggered by the source tool when a change occurs or used in the Enterprise Service Bus architectures (there is a pre-built JMS integration).
As an example, we recently had a customer who wanted to import from IBM Business Glossary. The spreadsheet importer could readily bring most of the data, but more complete information required using XML since IBM Business Glossary XML export contains more complete information – such as custom fields that may have been defined for this organization. It took a less than a week to create such importer out of the pre-built capabilities. This included ability to add new custom fields to EDG as part of the import. Much of the work was about understanding how this customer was using IBM Business Glossary and how they wanted this information to look like in EDG.
Q11: Does it integrate with SAS?
Please refer to the answer above to Q10.
Q12: What possibilities does EDG have to harmonize and integrate different ontologies?
You can combine different ontologies by including them into each other. This will let you, for example, create your own ontology, include two other ontologies into it and define relationships between them. For example:
- You may want to make your own Customer class a sub-class of a more general Customer class in another ontology. In this case, your class will inherit information from the parent class. Or you can add new properties to the class from another ontology. This is what we have done in the demo by adding “special id” property to a pre-existing ontology for data assets.
- Or you can create a relationship between your class and class in another ontology. In this approach ontologies you are extending or integrating do not change – you are only adding connecting statements to them.
- If you disagree with some of the statements in the ontologies you are wanting to re-use, the new W3C standard called SHACL (Shapes and Constraints Language) lets you disable these statements.
There are some “out of the box” merging capabilities that will merge two concepts into one. For example, you may want to look for classes or properties with the same labels (names) and combine them. If you do this, you will be actually changing one or both of the original ontologies. When such transformation happens at the model level, there is always a question to what extent it has any impact on data. Meaning, is there data that uses a particular ontology class, for example, and, if so, should it be changed? We’d like to better understand your use case as most of the transformations we have seen are about integrating and harmonizing data expressed using different ontologies. Our approach to doing this is through SPINMap where users can map two ontologies and mappings can be directly executed to align and transform data.
When users map columns of the spreadsheet they are importing, under the covers, they are actually creating a SPINMap. While TopBraid EDG can run SPINMaps and create them from the simple mapping UI, imapping n the current release, the fully featured graphical tool is only available in the IDE, TopBraid Composer. We are planning to bring this capability to the web UI later this year, aligning it with SHACL. In the meantime, you can learn more about SPINMap by following these links:
- Composing the Semantic Web
- SPINMapRDBMS
- Putting your Drag and Drop SPINMap Vocabulary Mappings into Production
Q13: How would things differ b/w when you are doing in house development vs implementing off the shelf apps?
Most organizations have a combination of in house developed applications and off the shelf applications. Many also have built their own mini-modules to complete missing functionality in vendor products or to integrate and make them interoperable with their other systems.
Capturing all systems’ data structure – from database instance, data collection, data group/schema/database, data set, data element, and reference data would be treated the same, irrespective of whether they are custom developed or vendor-provided or a combination.
As an example, we’ve had customers who imported into EDG data dictionaries from SAP and other vendor packages. Data assets used by the vendor packages would be catalogued and mapped up to the enterprise semantics through semantic mappings (what information is where in the data?) and lineage mappings (how does information flow as data in motion, how is it’s meaning changed at each step? what business rules are applied?). And of course, to capture the complete data spec from conceptual models/meanings into performative physical data structures along with the data constraints (e.g., nullability, datatype), and permissible meanings/values. Ultimately having this information is necessary in order to rise up to the enterprise level of information management. It does not need to be done all at once, but could be staged to capture it once, reuse it thereafter, and fill gaps and fix mistakes as they arise.