SHACL (SHAPES Constraint Language) is a powerful, recently released W3C standard for data modeling, ontology design, data validation, inferencing and data transformation. In this post, we explore some important ways in which SHACL can be used to support capabilities needed for data governance.
Below, each business capability or value relevant to data governance is introduced with a brief description, followed by an explanation of how the capability is supported by SHACL, accompanied by a few specific examples from the use of SHACL in TopBraid Enterprise Data Governance.
Data Governance Capability 1: Asset definition
SHACL is used in TopBraid EDG to identify what info should be captured about assets that are placed under governance.
TopBraid EDG comes with over 100 asset types that are pre-built using SHACL. These models define attributes and relationships for each type of asset. Models drive the user interface i.e., what fields will appear on a form, order of the fields, their division into sections and other relevant information.
Data Governance Capability 2: Ability to explicitly define best practices for effective data governance information collection and ensure compliance with them on data entry
SHACL is used to enforce organization’s best practices. For instance, we may want to say that a business term must have a short description, its length must be under 100 characters and it must be available in both, English and French. The screenshot below illustrates that these SHACL definitions (shapes) have been created in TopBraid EDG, and that an example short description of a glossary term “Access Management” violates (does not conform with) them.
SHACL offers a rich modeling language to express such data requirements and facilitate conformance to them. TopBraid EDG will check for conformance with these data quality conditions and alert users about any inconsistencies. When editing directly in TopBraid EDG, it will protect users from entering data that violates best practices.
Data Governance Capability 3: Ease of configuration and forward compatibility of customer configurations
SHACL is used to support configurability of asset information.
TopBraid EDG offers predefined models for many asset types such as business term, data element, etc.
However, organizations often need to configure the pre-defined models. SHACL not only offers flexibility to define new attributes and relationships for an asset type, but also lets users ‘disable’ predefined attributes and relationships if they decide they don’t need to capture certain information. As an example of removing a field, take a look again at the screenshot above (for Capability 2). It shows the built-in set of attributes and relationships for glossary terms. Note that the field “acronym” is shown as included in this default set within Glossary Term Metadata section. It is positioned between ‘alternative label’ and ‘definition’ fields.
The screenshot below illustrates how a user can modify models that underly TopBraid EDG. For example, they can remove the acronym from the form for a glossary term by simply deactivating the SHACL shape that specifies that glossary terms can have acronyms, that the acronyms must be either plain strings or language specific strings and that this field should be positioned in the Glossary Metadata section of the form.
The next screenshot below shows a view that confirms that the property shape for ‘acronym’ has been deactivated for glossary terms in the user’s customized glossary model.
The next (final) screenshot for this example (below) shows a display of the resultant user-configured form for displaying glossary terms – with the acronym property removed from the display.
SHACL supports such configurations in a forward-compatible way. Users don’t need to worry that upgrading to the next release of TopBraid EDG will disable their configurations.
And, of course, users can also use SHACL to define new asset types.
Data Governance Capability 4: Ability to ensure that governance workflows conform and support agreed best practices
SHACL is used to assure changes that are made through workflows.
Each transition of a workflow from one state to another can include SHACL statements specifying conditions that data must satisfy in order to pass on to the next state in a workflow.
Data Governance Capability 5: Role based personalization
SHACL can be used to provide different views or perspectives for different users
SHACL let’s us define multiple shapes for a given asset type. These shapes can be associated with different user roles. As a result, different users will get different views of the information. To support unique needs of each role, some fields can be specified as view-only. Other fields can be calculated based on available values.
Data Governance Capability 6: Automating stewardship activities through enhanced data discovery
SHACL rules can be used to create new connections across collected information.
Rules defined in SHACL are directly executable and can be used to infer new information from the technical metadata and data profiling snapshots e.g., what columns contain personally identifiable information or where reference data is being used.
Data Governance Capability 7: Data “on boarding” ease and configurability
SHACL can be used to define data transformations.
When importing information from different sources (e.g., spreadsheets), information needs to be mapped to the asset definitions in TopBraid EDG. SHACL let’s us do this.
Data Governance Capability 8: Data interface configurability
SHACL can define data interfaces.
Just like SHACL can be used to specify different views of the information for different roles, it can also define different views or data structures to be delivered to systems that must interact with TopBraid EDG to access the information it stores. The definitions can include computations and transformations needed to satisfy the requirements of consuming systems.
Data Governance Meta-Capability 9: Learn SHACL once, use it to support many needs
SHACL provides a single language for delivering a range of capabilities.
All of the value propositions and features listed here are key for an effective data governance solution. Each of these could be supported using its own technical approach. This would require a separate investment in understanding and learning how to use each of the approaches. Supporting all of them with SHACL brings consistency, and protects customers from having to learn different approaches to take advantage of these capabilities.