KNOWLEDGE GRAPHS FOR INTEGRATED DATA GOVERNANCE

What is a Knowledge Graph?

There has been a lot of buzz about knowledge graphs lately. What are they? Why talk about knowledge graphs in the context of information management? Google and Microsoft use “knowledge graphs” for smart search. In fact, if you google for this term, one of the first hits is a Wikipedia article saying that “The Knowledge Graph is a knowledge base used by Google and its services to enhance its search engine’s results with information gathered from a variety of sources.”

This post talks about knowledge graphs in a more general sense, as a technology. It also outlines how knowledge graphs provide a powerful platform for integrated data governance.

A knowledge graph is an interconnected set of information that is able to meaningfully bridge enterprise metadata silos.

Knowledge graphs are capable of supporting this because they are:

Flexible — graphs are the most flexible formal data structures (making it simple to map other data formats to graphs) that capture explicit relationships between items so that you can easily connect new data items as they are added and traverse the links to understand the connections.
Evolvable — able to accommodate diverse data and metadata that adjusts and grows over time, much like living things do.
Semantic — the meaning of the data is stored alongside the data in the graph, in the form of the ontologies or semantic models. This makes knowledge graphs self-descriptive, a single place to find the data and understand what it’s about.
Intelligent —the semantics of data are explicit and include formalisms for supporting inferencing and data validation. As a self-descriptive data model, knowledge graphs enable data validation and can offer recommenda¬tions for how data may need to be adjusted to meet data model requirements. They also enable drawing conclu¬sions and new information from the available data.

A knowledge graph allows you to store information in a graph model and use graph queries to easily navigate highly connected datasets.

With these qualities, knowledge graphs are an ideal and, arguably, the only viable foundation for bridging and connecting information about all enterprise assets.

Is every Graph a Knowledge Graph?

Graph technology is gaining adoption and you may hear more and more about it. But not all graphs are created equal. Though flexibility of the data structure is common to all graph technologies other knowledge graph qualities – such as the ability to capture the description of data — are not necessarily present in all graph technologies. Thus, not every graph is a knowledge graph. Two graph data models that are the most popular today (property graphs and RDF graphs) differ strongly in their support for this capability:

Property graphs — an example of a database implementing a property graph data model is Neo4J, but there are a number of others including open source property graph databases. No standard definition of the “property graph” data model or a standard serialization of data they store exists. Different vendors implement their own variants of the property graph data model which limits their interoperability and the ability to connect and query them in a uniform way. Property graphs are flexible and evolvable like all NoSQL databases, but they are not semantic. They lack a language (a representation) for storing the meaning of data. Consequently, they can’t, on their own, be regarded as “intelligent.”
RDF (resource description framework) graphs — a number of databases are based on the RDF data model. They are offered by commercial vendors and there are also open source implementations. The generic name for the database technology that implements the RDF data model is triple store. A name that reflects the fact that each graph statement consists of three parts: subject, predicate and object, just like a simple sentence.

Neighborgram

Example of a visualization of a snippet of a knowledge graph in TopBraid EDG – FIBO (Financial Industry Business Ontology). Explore FIBO

RDF Graphs (+ other standards) Support Descriptive, Meaningful Data

RDF is a W3C standard (just like XML and HTML). Thus, the RDF data model is fully interoperable across all vendors. It offers standard data serialization formats (for bringing data in and out) and a standard query language. Complementing RDF, W3C also standardized languages for describing data semantics and inferring new data facts:

a simple language called RDF Schema language (RDFS) offers very minimalistic support for data description.
a new W3C standard that provides a more complete RDF schema language (and more) is called SHACL (Shapes and Constraints Language).

RDF Graphs Provide Globally Unique Identifiers

Every resource in an RDF graph has a globally unique, dereferenceable, web identifier — a URI. With its URI, a resource can be reliably referred to and accessed from any application.

Just as all graphs are not created equal, not all GUIDs (globally unique identifiers) are created equal. For any other system, GUID is really a misnomer. The IDs they generate are not global across an enterprise ecosystem. They are only global in the context of a given installation of a given system —this can hardly be called global. They represent a decades old legacy view on identity.

The Web and, more recently, Blockchain technologies made this view obsolete because it encourages silos that your enterprise will then have to spend significant efforts on connecting — often in brittle, expensive and inefficient ways. Yet, chances are that this type of legacy technology (e.g. with GUIDs) is what your data governance and related (e.g., business glossaries, metadata management) systems are using. Using URIs as identifiers, on the other hand, ensures that enterprise assets are truly unique and addressable. You can have two different systems use the same identifier (URI) when they talk about the same assets. Thus, graphs of data from such systems come together seamlessly without the need to employ data integration efforts. You can also refer to a URI and dereferencing will deliver data about the asset it represents.

RDF Graphs are Connectable

Another key feature of RDF graphs is that they are connectable — exactly like the web. You can think about one giant knowledge graph spanning the world. And you can also think about multiple, separate (but connectable) knowledge graphs — like an individual website or a web page.

From the discussion above, you can see that RDF graphs are indeed knowledge graphs. Property graphs, on the other hand, can’t be used as knowledge graphs unless a higher level knowledge representation is defined for them. In principle, it is possible for an organization to build its own knowledge representation on top of property graphs. However, it would be proprietary, not interoperable and any organization doing this would be duplicating many years of effort expended by the members of the World Wide Web consortium to define and ratify its semantic standards.

Using knowledge graph technologies, TopBraid Enterprise Data Governance (TopBraid EDG) delivers integrated data governance. To learn more, we welcome you to download the full whitepaper Building Bridges to Business Context is Essential to Data Governance.

(See also the first post from topics covered in the whitepaper: Data Governance as A Lifecycle-Centric Asset Management Activity)