In a previous article in this series, we pointed out that data fabric is not a single tool.
Data fabric is a vision and associated architecture that recognizes and supports decentralized data operations.
It is an architecture in which different software products work collaboratively to provide access to data. In this article, we want to explore this topic in more detail.
Current data environments are siloed. For decades, organizations used an approach to IT characterized by the practices of “an application for everything, and a database for every application”. This resulted in many applications and just as many databases. One-to-one integration between the multitude of systems used by a modern enterprise does not scale, so organizations started to create pre-integrated data warehouses. However, as data multiplied and new data sources were added, it increasingly became obvious that copying and moving data into a single place is not scalable nor sustainable.
The vision of data fabric is a data environment that looks like a network where people and systems collaborate on data. The network has the potential to replace traditional application-specific databases with shared data models powering application-specific use cases. When we think about a network, the concepts that often come to mind first are physical – pipes, lines, and so on. We have networks all around us – networks of railroads, highways, electrical power, radio and telecommunications. These networks were made possible through standardization of physical infrastructure. Similarly, the internet gave us network connectivity through standards such as TCP/IP and HTTP. Creation of the internet required for standardization to encompass not only physical objects, but also software protocols.
For data to be networked in the way imagined by data fabric, standardization must “move up” another level. The standards must encompass shared and coordinated ontologies and taxonomies. Without shared standards, metadata collected from different sources and contexts will continue to be siloed.
To accommodate the multi-faceted nature of data and support its evolution, data standards must be open. Governance for a data fabric must, therefore, be agile to enable its continuous growth, not only allowing for the evolution of data in use but also the standards for that data. An open standard is:
- Unambiguous – expressed in away that eliminates any doubts about its meaning, scope and context of use;
- Extendable – allows different communities to re-use and build upon the common core;
- Maintainable – offers a framework for revisions, extensions, testing, documentation and permanent access;
- Transparent – includes technical discussions, meeting minutes, and allows for the capture of issues and decisions that are, archived, traceable and referenceable;
Semantic technology standards provide a solid foundation for the creation of coordinated data standards in the form of ontologies and taxonomies that provide information that is:
- Unambiguous – achieved through using semantic standards, languages that let us describe these artifacts in a clear and interoperable way
- Extensible – due to the capabilities designed into semantic technology, one data model can be included into another by reference, new concepts can be added or expanded, and concepts not needed by some users can be deactivated
The power of a standard comes from the power of all the stakeholders using it.
Agreeing on the standards is never easy as it requires consensus of many different parties with varying points of view. This is especially true for data as it is often nuanced, multi-faceted, and used for different purposes as it evolves. This is why a key component of any data fabric is, in addition to standards, governance of those standards. The governance of open standards, essential to the vision of data fabric, requires:
- Participation from multiple stakeholders. Stakeholder participation requires ease of access, input and modification that can’t be achieved through the use of text documents;
- Agreement on ontologies and taxonomies. Achieving consensus for these artifacts depends on the provision of collaborative tools that support discussions, and the facilitation of processes for making decisions and capturing the results of those decisions;
- Maintenance of a continuity of revisions. As standards go through revisions, some applications may transition to new versions faster than others. This means that multiple versions of standards must be accessible at the same time. Further, differences between revisions and information on who is using what revision must be readily available;
- Tools to support compliance testing against the standards. Presence of different levels of standards and, potentially, different revisions of standards brings a strong need for this capability.
Traditionally, data standards were developed using spreadsheets and free text documents. Today, data fabric architecture can’t be sustained through these practices. It requires a dynamic, active and collaborative environment that will ensure accessibility, transparency and the maintainability of standards powering a data fabric.