In a recent report, “Augmented Data Catalogs: Now an Enterprise Must-Have for Data and Analytics Leaders, September 2019“, Gartner discusses the importance of Knowledge Graphs for effective metadata management. The report points out that a solution supporting effective metadata management must span data catalogs, business glossaries, data lineage information, rules, semantic connections and more. Spanning all these items requires connecting them, and that is what a Knowledge Graph is very good at. Further, it creates connections in a flexible and evolvable way.
A key problem with the traditional approaches and products for data and metadata management is that they are not easily connectable, and result in more metadata silos. These silos don’t help in making data more valuable, usable, searchable, discoverable (or any of the other good ‘-able’ words).
So what is a Knowledge Graph?
First, it is not a black box. It is yours to define, evolve and use, because it is based on open standards for semantic graphs. These open standards allow it to represent any knowledge domain. And by represent, I mean to capture in a graph not only facts about entities you are interested in, but also classes properties and rules that define these entities. No more hard-coding of business logic in code. Knowledge Graphs are a “smart” approach to meaningfully bridging enterprise metadata silos by:
- Enabling connections between any kind of data
- Capturing and preserving meaning
- Supporting the business value of data (e.g., data quality, lineage, etc.)diagram below shows:
For example, the diagram below shows:
- In the lower section of the diagram, a Knowledge Graph fragment saying that a person (James) has a parent (Andrew) and his eyes are blue – these are our data facts.
- In the middle section, a Knowledge Graph fragment saying a person will have two parents, an eye color and that a male parent is a father – our model of a person.
- In the upper section, a Knowledge Graph fragment capturing a rule that says that if both parents have blue eyes, that person has blue eyes as well – further extension or enrichment of the model with rules.
With this, even if we did not know James’ eye color, but knew that his parents eye color and their eyes were blue, our Knowledge Graph could infer that James has blue eyes.
All the layers of the Knowledge Graph – data facts, models, rules, reside together or are connectable.
Let’s now take a look at another example, shown in the next diagram. For governance, version control, access, and many other reasons, you may want (and need) to maintain different knowledge graphs. Below we have three different knowledge graphs named KG1, 2, and 3:
- In the first, KG1, we are doing data asset management, for example, mapping data elements to glossary terms, elements to elements, flagging PII, assigning stewardship, and other great things.
- In KG2, we are doing technical asset management
- In KG3 the focus may be on the enterprise asset management
As separate, naturally modular, efforts, each of them is very manageable and relatively quick to be created, but by bringing them together we can self-compose information that never existed before.
For example, we may have a dataset with a PII data element in KG1 and a policy restricting physical location of PII storage in KG3. Bringing the graphs together, infers that the dataset is located in an area the policy is applicable; thereby, enabling us to enforce the policy.
Further, you can look at a Knowledge Graph with different lenses on. Below is a high-level, interactive lineage diagram (we call it a LineageGram) that is dynamically generated by metadata about our data ecosystem and the model describing it. In this example, metadata plays a role of the data facts captured in the Data Governance system. The model allows us to approach the information as a message that we can drill into when we need to explore deeper into the what and how.
For instance, on the right is a Patient Discharge Form. Suppose we want to understand the impact of changing one or more fields on the form? Here we see how complex processing of a patient discharge actually is, and this LineageGram allows us to assess the impact of this change. The paths of the lineage are interactive (clickable) and you can follow your nose as you see fit. Clicking on the Data Flow icon, I’m presented with that derivation (second image below).
I want to conclude this blog with a quote from Gartner’s report that I found to be insightful: “…a knowledge graph grows and becomes more comprehensive as more metadata sources are ingested, and business users enrich it with context and semantics.”