This document is part of the TopQuadrant GraphQL Technology Pages
This document introduces SHACL property value rules, a proposed extension to the SHACL-AF specification. Property value rules can be used to instruct an engine to dynamically derive (or "infer") property values at query time even if no matching statements have been asserted in the data graph. In TopBraid, these inferred property values can be queried like any other field in GraphQL but also through SPARQL. The features described here are available with TopBraid 6.1 onwards, and the base features are also exposed via matching SHACL API builds.
RDF data and knowledge graphs contain all kinds of statements that have been explicitly entered (or: asserted) by users. Many applications query these graphs to derive additional statements that can be inferred from the asserted statements. For example, assume we have assertions about persons, their gender and a parent relationship:
kennedys:JohnKennedy a schema:Person ; schema:birthDate "1917-05-29"^^xsd:date ; schema:deathDate "1963-11-22"^^xsd:date ; schema:gender "male" ; schema:givenName "John" ; schema:familyName "Kennedy" . kennedys:CarolineKennedy a schema:Person ; schema:birthDate "1957-11-27"^^xsd:date ; schema:gender "female" ; schema:parent kennedys:JohnKennedy . kennedys:JohnKennedyJr a schema:Person ; schema:gender "male" ; schema:parent kennedys:JohnKennedy . kennedys:PatrickBKennedy a schema:Person ; schema:gender "male" ; schema:parent kennedys:JohnKennedy .
Common sense defines rules that can be used to derive additional statements from this data:
So the following statements could be inferred from the sample data and the inference rules above:
kennedys:JohnKennedy schema:children kennedys:CarolineKennedy . schema:children kennedys:JohnKennedyJr ; schema:children kennedys:PatrickBKennedy . schema:fullName "John Kennedy" ; schema:son kennedys:JohnKennedyJr ; schema:son kennedys:PatrickBKennedy . kennedys:CarolineKennedy schema:sibling kennedys:JohnKennedyJr ; schema:sibling kennedys:PatrickBKennedy ; schema:age 60 . # in August 2018 ...
The designer of a data graph may decide to assert all these statements so that they can be readily queried, yet there are downsides that often make this impractical: In addition to the larger storage requirements for the inferred triples, there is a maintenance problem because if data changes then it is not always easy to update all depending statements. If data changes frequently, or depends on external factors (such as the current date for age in the example), then keeping the inferences correct and up to date becomes a technical nightmare. In those cases, it is far easier to compute these values dynamically, even if this may cause a performance penalty for the necessary computations.
This document introduces a new mechanism that makes it possible to describe inferred values for SHACL property shapes.
The new property sh:values
can be used to link a property shape with a SHACL node expression that encodes instructions on
how the values of the property shall be computed.
SHACL node expressions had been introduced with the SHACL Advanced Features 1.0 specification.
In order to support more use cases, new kinds of node expressions are proposed and described here, see also the updated
SHACL Advanced Features 1.1 Community Draft.
Let's go through the examples introduced above and show how they can be implemented using sh:values
.
The starting point is a SHACL shape that targets all instances of schema:Person
and defines property shapes for the asserted statements:
schema:Person a rdfs:Class, sh:NodeShape ; sh:property [ sh:path schema:parent ; sh:class schema:Person ; ] ; sh:property [ sh:path schema:gender ; sh:in ( "female" "male" ) ; sh:maxCount 1 ; ] ; sh:property [ sh:path schema:birthDate ; sh:datatype xsd:date ; sh:maxCount 1 ; ] ; sh:property [ sh:path schema:deathDate ; sh:datatype xsd:date ; sh:maxCount 1 ; ] .
It is often convenient to be able to query relationships in both directions.
Here, only the schema:parent
is actually asserted in the data, yet it would be nice to query schema:children
as well.
We declare schema:children
using a property shape as follows:
schema:Person sh:property [ sh:path schema:children ; sh:class schema:Person ; sh:values [ sh:path [ sh:inversePath schema:parent ; ] ] ] .
The value of sh:values
is a blank node with a value for sh:path
.
This specifies a path expression that represents all values of the specified SHACL path.
Here, it's a path consisting of the schema:parent
relationship, but walked in the opposite direction.
We also use this property shape to tell the system that all values of this inferred property will also be instances of schema:Person
.
This is necessary to instruct the GraphQL processor to derive a schema from the SHACL shapes, and furthermore could be used to validate data.
Based on this definition, we can now issue a GraphQL query using TopBraid's GraphQL support as follows:
{ persons(uri: "http://topbraid.org/examples/kennedys#JohnKennedy") { label children { label } } }
This produces the following results, computing the values of the children
field by walking the schema:parent
relationship in the inverse direction.
{ "data": { "persons": [ { "label": "John Kennedy", "children": [ { "label": "John Kennedy Jr" }, { "label": "Caroline Kennedy" }, { "label": "Patrick B. Kennedy" } ] } ] } }
It is perfectly fine to use more complex path expressions than this inverse relationship, including deeper traversal of relationships. See the SHACL path syntax for details.
We define schema:son
as follows:
schema:Person sh:property [ sh:path schema:son ; sh:class schema:Person ; sh:description "The son(s) of a person. These values are inferred as the children that have male gender." ; sh:name "son" ; sh:values [ sh:nodes [ sh:path schema:children ; ] ; sh:filterShape [ sh:property [ sh:path schema:gender ; sh:hasValue "male" ; ] ; ] ; ] ; ] .
The sh:values
node expression is a filter shape expression,
consisting of a path expression that fetches all values of schema:children
for the current focus node,
and a filter shape (defined in SHACL) that these values are validated against.
Only the children that have schema:gender "male"
are returned as inferred values.
Such rules can also be visualized in diagrams:
Such diagrams illustrate that SHACL node expressions are essentially streams of RDF nodes (here: flowing from left to right), so that the output nodes of one step are the input to the next step. Each of these steps can modify the stream of RDF nodes, for example to filter or transform certain nodes.
Note that the sh:path
of the example references schema:children
which, by itself, is an inferred property.
If path expressions are used then TopBraid recursively evaluates the required inferences, allowing rules to be chained together.
We can now issue a GraphQL query to fetch the sons of John Kennedy:
{ persons(uri: "http://topbraid.org/examples/kennedys#JohnKennedy") { label son { label } } }
{ "data": { "persons": [ { "label": "John Kennedy", "son": [ { "label": "John Kennedy Jr" }, { "label": "Patrick B. Kennedy" } ] } ] } }
Filter shapes can be of arbitrary complexity, including any of the rich validation features of SHACL.
Any inferred field can be consistently used in GraphQL queries like any other (asserted) field, including for filtering. Here we ask for all persons who have at least 2 sons:
{ persons (where: { son: { minCount: 2 } }) { label } }
{ "data": { "persons": [ { "label": "John Kennedy" } ] } }
Siblings of a person are defined as follows:
schema:Person sh:property [ sh:path schema:sibling ; rdfs:comment "The siblings are inferred to be the children of the parents, minus the focus node itself." ; sh:class schema:Person ; sh:values [ sh:nodes [ sh:path ( schema:parent [ sh:inversePath schema:parent ] ) # schema:parent/^schema:parent ] ; sh:minus sh:this ; ] ; ] .
The node expression at sh:values
above uses a path expression that first walks up to the parents
of the focus person and then walks down again into the children.
This yields all persons that have overlapping parents, but including the focus person.
That is removed from the results using a minus expression.
{ persons(uri: "http://topbraid.org/examples/kennedys#JohnKennedyJr") { label sibling { label } } }
{ "data": { "persons": [ { "label": "John Kennedy Jr", "sibling": [ { "label": "Patrick B. Kennedy" }, { "label": "Caroline Kennedy" } ] } ] } }
Actually, let's make this more interesting, and use the inferred field as a filter in the query:
{ persons (where: { sibling: { hasValue: "http://topbraid.org/examples/kennedys#JohnKennedyJr"}}) { label } }
This produces all persons who have John Kennedy Jr as one of their siblings:
{ "data": { "persons": [ { "label": "Caroline Kennedy" }, { "label": "Patrick B. Kennedy" } ] } }
For the sake of this example, the full name of a person is defined as the concatenation of given/first name
and family/last name, with a space in between.
SHACL node expressions can call functions
including SPARQL functions.
Among the built-in SPARQL functions (see see sparql:
namespace)
is the CONCAT operation that we can use here:
schema:Person sh:property [ a sh:PropertyShape ; sh:path schema:fullName ; sh:name "full name" ; sh:datatype xsd:string ; sh:description "A person's full name, consisting of given name and family name, separated by a space." ; sh:maxCount 1 ; sh:values [ sparql:concat ( [ sh:path schema:givenName ] " " [ sh:path schema:familyName ] ) ; ] ; ] .
Here is a screenshot from TopBraid EDG 6.1 illustrating a possible visualization of this rule:
The age of a person can be computed dynamically, using the current date and the person's date of birth.
As this is a reasonably complex operation, we revert to SPARQL to implement it.
SPARQL includes a NOW() operation that delivers the current time stamp, and in the example below the TopBraid function
spif:timeMillis()
is used to convert time stamps into milliseconds.
schema:Person sh:property [ sh:path schema:age ; sh:datatype xsd:integer ; sh:description "A person's age derived from the current date and the given birth date. No value if the person is already deceased." ; sh:maxCount 1 ; sh:name "age" ; sh:values [ sh:prefixes <http://topbraid.org/examples/schemashacl> ; sh:select """ SELECT ?age WHERE { $this schema:birthDate ?birthDate . FILTER NOT EXISTS { $this schema:deathDate ?any } BIND (365 * 24 * 60 * 60 * 1000 AS ?msPerYear) . BIND (spif:timeMillis(NOW()) - spif:timeMillis(?birthDate) AS ?ms) BIND (xsd:integer(floor(?ms / ?msPerYear)) AS ?age) }""" ; ] ; ] .
We can use this field to ask for all persons younger than 70:
{ persons (where: {age: {maxExclusive: 70}}) { label age } }
In our example this only delivers one match:
{ "data": { "persons": [ { "label": "Caroline Kennedy", "age": 60 } ] } }
In general, SPARQL expressions can be used to further process the results of any other node expression, i.e. it is possible to chain together various node expressions and then use SPARQL to modify them. Details are found under SPARQL SELECT expressions and SPARQL ASK expressions. Users should of course be considerate of potential performance pitfalls, since SPARQL queries may need to be executed many times before results are produced.
FWIW, the above example may also be expressed without SPARQL syntax, but using SHACL node expressions instead. The computation here is quite complex, so I skip the source code of how to do that. The following image gives you an idea :)
Let's assume we have two databases: one with FOAF Persons, and another with schema.org Persons. Here is some sample data:
db1:KlausSchulze a foaf:Person ; foaf:firstName "Klaus" ; foaf:surname "Schulze" . db2:SteveRoach a schema:Person ; schema:givenName "Steve" ; schema:familyName "Roach" .
However, our application is about customer management and would like to pretend that the data had the following shape instead:
db1:KlausSchulze ex:firstName "Klaus" ; ex:lastName "Schulze" ; ex:fullName "Klaus Schulze" . db2:SteveRoach ex:firstName "Steve" ; ex:lastName "Roach" ; ex:fullName "Steve Roach" .
Using SHACL property value rules, we can create the second data structure as a virtual view on the data without moving data around.
We define a node shape that targets all instances of foaf:Person
and schema:Person
, and define the
properties that we want to expose, including the sh:values
rules to compute them when queried:
ex:Customer a sh:NodeShape ; rdfs:label "Customer" ; sh:targetClass foaf:Person ; sh:targetClass schema:Person ; sh:property [ a sh:PropertyShape ; sh:path ex:firstName ; sh:name "first name" ; sh:description "The first name, based either on foaf:firstName or schema:givenName." ; sh:datatype xsd:string ; sh:maxCount 1 ; sh:values [ sh:path foaf:firstName ; ] ; sh:values [ sh:path schema:givenName ; ] ; ] ; sh:property [ a sh:PropertyShape ; sh:path ex:lastName ; sh:name "last name" ; sh:description "The last name, based either on foaf:surname or schema:familyName." ; sh:datatype xsd:string ; sh:maxCount 1 ; sh:values [ sh:path foaf:surname ; ] ; sh:values [ sh:path schema:familyName ; ] ; ] ; sh:property [ a sh:PropertyShape ; sh:path ex:fullName ; sh:name "full name" ; sh:description "The full name, consisting of first name and last name, separated by a space." ; sh:datatype xsd:string ; sh:maxCount 1 ; sh:values [ sparql:concat ( [ sh:path ex:firstName ] " " [ sh:path ex:lastName ] ) ] ; ] .
Note that some properties can have multiple sh:values
expressions, and the resulting triples are the
union of them all.
Using TopBraid's GraphQL support, we can now issue this query:
{ customers { uri fullName } }
TopBraid produces this JSON output:
{ "data": { "customers": [ { "uri": "http://example.org/db1#SteveRoach", "fullName": "Steve Roach" }, { "uri": "http://example.org/db2#KlausSchulze", "fullName": "Klaus Schulze" } ] } }
TopBraid includes a magic property (aka property function) tosh:values
that can be used to fetch inferred values,
or to check whether a given focus node has certain inferred values for a given predicate.
Here is an example query:
SELECT * WHERE { ?person a schema:Person . (?person schema:age) tosh:values ?age . }
Note that this magic property can only be used to derive the right-hand value from the left-hand values, not vice versa.
So the caller needs to make sure that both variables on the left-hand side are bound when tosh:values
is evaluated.
This magic property makes property value rules available to any SPARQL-based technology in the TopBraid platform, including SWP,
SPARQLMotion, SPIN and SHACL-SPARQL itself.
tosh:values
falls back to any declared sh:defaultValue
if no other value exists for the focus node
and predicate.
We are currently evaluating whether this integration with SPARQL should also more directly work with every use of an inferred property in a SPARQL query. For example, the following would then also work:
SELECT * WHERE { ?person a schema:Person . ?person schema:age ?age . }
We welcome feedback on whether TopBraid should support this syntax in SPARQL or whether tosh:values
is sufficient.
TopBraid Enterprise Data Governance (EDG) is
an agile data governance solution for today's dynamic enterprises.
Among many other features, it provides an editing and browsing environment to manage metadata about data assets such as
databases, database tables and database columns.
The data model behind these capabilities is built around SHACL - for example it contains a type shape edg:DatabaseTable
with a property edg:tableOf
and a type shape edg:DatabaseColumn
with a property edg:columnOf
.
The following screenshot shows how we are using SHACL inferences to derive all kinds of additional information for users and
software agents:
In the screenshot, the inferred values are marked with a blue label (inferred). Here is the definition of "number of tables":
edg:Database sh:property edg:Database-tableCount . edg:Database-tableCount a sh:PropertyShape ; sh:path edg:tableCount ; sh:datatype xsd:integer ; sh:description "The number of tables in this database, automatically computed." ; sh:group edg:StatisticsPropertyGroup ; sh:maxCount 1 ; sh:name "number of tables" ; sh:values [ sh:count [ sh:path [ sh:inversePath edg:tableOf ; ] ; ] ; ] .
Using TopBraid's GraphQL service, this data can be easily queried:
Here is a more complex example, computing the total number of columns across all tables and views associated with a database.
edg:Database sh:property edg:Database-totalColumnCount . edg:Database-totalColumnCount a sh:PropertyShape ; sh:path edg:totalColumnCount ; sh:datatype xsd:integer ; sh:description "The number of overall columns in this database, automatically computed." ; sh:group edg:StatisticsPropertyGroup ; sh:maxCount 1 ; sh:name "total number of columns" ; sh:order 10 ; sh:values [ sh:count [ sh:path ( [ sh:alternativePath ( [ sh:inversePath edg:tableOf ] [ sh:inversePath edg:viewOf ] ) ] [ sh:inversePath edg:columnOf ; ] ) ; ] ; ] .
The new generation of form displays in TopBraid also provides a SHACL-based widget to display multiple resources in tabular form, see the Overview section in the screenshot. To produce such tables, define a SHACL shape with properties for each column that you want to render:
edg:DatabaseTableSummary a sh:NodeShape ; sh:targetClass edg:DatabaseTable ; rdfs:comment "A shape that can be applied to DatabaseTables to provide a summary view." ; rdfs:label "Database table summary" ; sh:property [ a sh:PropertyShape ; sh:path edg:name ; sh:datatype xsd:string ; sh:maxCount 1 ; sh:minCount 1 ; sh:name "name" ; sh:order 0 ; ] ; sh:property [ a sh:PropertyShape ; sh:path edg:columnCount ; sh:datatype xsd:integer ; sh:description "The number of columns, inferred from columnOf triples." ; sh:maxCount 1 ; sh:name "column count" ; sh:order 1 ; sh:values [ sh:count [ sh:path [ sh:inversePath edg:columnOf ; ] ; ] ; ] ; ] ; sh:property [ a sh:PropertyShape ; sh:path edg:recordCount ; sh:datatype xsd:integer ; sh:description "The number of records." ; sh:maxCount 1 ; sh:name "record count" ; sh:order 2 ; ] .
Such shapes can be edited with the EDG Ontology Editor or with TopBraid Composer, or with any similar tool. To instruct the system to display these summary values in an HTML table, use the following:
edg:Database sh:property edg:Database-tableSummary . edg:Database-tableSummary a sh:PropertyShape ; sh:path edg:tableSummary ; tosh:viewWidget swa:SummaryTableViewer ; sh:description "The tables in this database as summaries, automatically computed." ; sh:group edg:OverviewPropertyGroup ; sh:name "table summary" ; sh:node edg:DatabaseTableSummary ; sh:values [ sh:path [ sh:inversePath edg:tableOf ; ] ; ] .
As shown above, the property tosh:viewWidget
provides UI metadata that is used by TopBraid and potentially other tools.
The sh:node edg:DatabaseTableSummary
statement selects the shape that declares the columns that shall be used, and their order.
From there, the system can collect all relevant information.
For example, it can understand that certain properties always return xsd:integer
values, which instructs the tabular
display to right-align the values.
Note that in the table above, not only the values of edg:columnCount
are computed on-the-fly, but even the rows
of the table itself is inferred.
So SHACL can be used to define views on data that is stored in RDF, for reporting and analytical purposes.
As this little table is also driven by GraphQL, software agents and users can use the same shape definitions to run queries:
{ databases { label largeTables: tableSummary (orderBy: columnCount, orderByDesc: true, where: { columnCount: { minInclusive: 10 } }) { label columnCount } } }
This produces all databases and for each database it selects an ordered list of tables that have at least 10 columns.
{ "data": { "databases": [ { "label": "NORTHWIND", "largeTables": [ { "label": "DBO.EMPLOYEES (NORTHWIND)", "columnCount": 18 }, { "label": "DBO.ORDERS (NORTHWIND)", "columnCount": 14 }, { "label": "DBO.SUPPLIERS (NORTHWIND)", "columnCount": 12 }, { "label": "DBO.CUSTOMERS (NORTHWIND)", "columnCount": 11 }, { "label": "DBO.PRODUCTS (NORTHWIND)", "columnCount": 10 } ] } ] } }
Note that the GraphQL service would also derive values based on sh:defaultValue
if no other value exists for a field.
In TopBraid, the values of sh:defaultValue
may be node expressions too.
Note that on-the-fly inferences are only visible in certain circumstances.
You cannot just query them as you would with normal triples.
Currently, they are only exposed through GraphQL fields and when a path node expression is used as part of a property value rule,
and the sh:path
in that node expression is an IRI node.
In other words, path expressions such as skos:narrower*
is not supported at this stage (this is potential future work).
The inferences are not exposed in SPARQL triple matches or similar technology, unless a SHACL inferencing engine has been executed beforehand. In TopBraid Composer, press the Run Inferences button to materialize the inferences. In TopBraid EDG, use Transform > Execute Rules.
There is also a current limitation in how the system selects which property value rules are executed for a given focus node:
The system selects node shapes that have property shapes with sh:values
based on the rdf:type
of the focus node.
It will look for any non-deactivated node shape that is either a class and has the focus node as instance, or that has a sh:targetClass
matching one of the types of the focus node.
Other types of targets including user-defined targets, sh:targetSubjectsOf
, sh:targetObjectsOf
and sh:targetNode
are not supported at this stage due the potential performance impact that they might have.
This may be improved in future versions.