SHACL Tutorial: SPARQL-based Constraints

Updated for the SHACL W3C Recommendation and TopBraid 5.3

Introduction and Prerequisites

This short tutorial explains how to express rich semantic constraints on an RDF data model using SHACL, and its SPARQL-based constraints syntax in particular.

For this tutorial you should be familiar with the basic SHACL syntax, see the Getting Started Tutorial from this series. You should also be familiar with SPARQL. The examples in this Tutorial are using the RDF Turtle notation. We also show screenshots of TopBraid Composer, for those who would like to follow along with a graphical editing tool. To execute the scenario from Java code, consider using the TopBraid SHACL API.

Setting up the Example Scenario

For this tutorial, we are using the well-known schema.org data model, which can be downloaded in various formats (e.g., a SHACL version). In particular, we use the Hotel star rating aspect of the schema.org model. Our task is to make sure that a warning is reported whenever a Hotel has more than one review by the same person, to ensure that nobody has tampered the hotel’s reputation. In a nutshell, the data model surrounding this looks like:

SHACL-Tutorial-SPARQL-ClassDiagram

Let’s create a new SHACL file in TopBraid Composer, and import the schema.org namespace into it:

SHACL-Tutorial-SPARQL-CreateFile

The source code of the resulting file looks like this:

<http://example.org/hotelratings>
    a owl:Ontology ;
    owl:imports <http://datashapes.org/dash> ;
    owl:imports <http://topbraid.org/schema/schema-single-range> .

As a next step, we need to prepare some namespace prefixes. First, we create a prefix for “ex”, e.g. by replacing the “hotelratings” namespace as shown below. Second, we need to make sure that all prefixes that we want to use in SHACL-SPARQL are properly declared as SHACL triples. This is done by selecting the “schema” namespace in the table and pressing the “SHACL” button highlighted below:

SHACL-SPARQL-Namespaces

The result of this is that the graph now includes a sh:PrefixDeclaration:

<http://example.org/hotelratings>
    a owl:Ontology ;
    sh:declare [
        a sh:PrefixDeclaration ;
        sh:namespace "http://schema.org/"^^xsd:anyURI ;
        sh:prefix "schema" ;
    ] .

In order to bring the example to life, we need to have some example instances. For the sake of this example, let’s create a couple of instances of schema:Hotel and schema:Person, and then some instances of schema:Rating that are connecting them.

ex:HotelBrimbelle
    a schema:Hotel ;
    schema:starRating [
        a schema:Rating ;
        schema:author ex:MirellaBella ;
        schema:ratingValue 5.0 ;
    ] ;
    schema:starRating [
        a schema:Rating ;
        schema:author ex:MirellaBella ;
        schema:ratingValue 4.5 ;
    ] .

ex:HotelParadiso
    a schema:Hotel ;
    schema:starRating [
        a schema:Rating ;
        schema:author ex:JimCandler ;
        schema:ratingValue 3.5 ;
    ] .

ex:JimCandler a schema:Person .
ex:MirellaBella a schema:Person .

Adding a Constraint to the Data Model

Using this data, we can use a SPARQL editor such as the one built into TopBraid to ask for all hotels that have multiple ratings from the same author:

SHACL-Tutorial-SPARQL-Query

The query above gets two distinct values of schema:starRating, matching their ?author. Note the != is necessary because SPARQL may otherwise walk the same rating twice. Also note that we are using the variable $this to refer to the instances of schema:Hotel that we are interested in. This choice of variable makes it easy to move back and forth between stand-alone SPARQL queries, and those stored in SHACL shapes.

Once we have sufficiently experimented with our query, and get some sensible results, we can turn the query into a constraint. Every SPARQL-based SHACL constraint must be part of a shape definition. In our case, we want the constraint to apply to all instances of schema:Hotel, so we simply turn the class schema:Hotel into a SHACL shape by adding an rdf:type sh:NodeShape statement. With the class schema:Hotel selected, the easiest way to do that in TopBraid is to press the little Enable SHACL constraints for this class button in the upper right corner:

SHACL-Tutorial-SPARQL-EnableClass

Now we can attach a SPARQL constraint to schema:Hotel, via the sh:sparql property. All we need to do is say Add empty row from the context menu behind sh:sparql and paste the SPARQL query into the text box under sh:select:

SHACL-SPARQL-Constraint

The source code of this constraint in the context of the Hotel shape is as follows:

schema:Hotel
    a owl:Class ;
    a sh:NodeShape ;
    sh:severity sh:Warning ;
    rdfs:subClassOf schema:LodgingBusiness ;
    sh:sparql [
        sh:message "Hotel has multiple reviews by {?author}" ;
        sh:prefixes <http://example.org/hotelratings> ;
        sh:select """
            SELECT DISTINCT $this ?author
            WHERE {
                $this schema:starRating ?rating1 .
                ?rating1 schema:author ?author .
                $this schema:starRating ?rating2 .
                ?rating2 schema:author ?author .
                FILTER (?rating1 != ?rating2) .
            }""" ;
    ] .

To verify that the constraint returns correct and useful results, use the SHACL Validation view, or some other SHACL tool of your choice:

SHACL-Tutorial-SPARQL-Results

How does it all work?

Now that we have seen this scenario in action, let’s examine how this all works. SHACL shapes define constraints on a collection of nodes in your data model. In this case, the nodes targeted by the schema:Hotel shape are the instances of the schema:Hotel class. These instances are called the target nodes of the shape. If an RDFS or OWL class is also declared to be a shape then its target nodes are its own instances.

Shapes can define constraints in multiple ways. One of them is to use sh:property to define property constraints, as explained in the Getting Started part of this tutorial. If you need to validate some more complex constraints, then SPARQL is a good choice, because SPARQL is very expressive, allowing you to probe your data with almost arbitrary queries, join nodes from different places, or even do string and math operations.  In order to connect a SPARQL query with your shape, use the sh:sparql property, then sh:select to point at the actual query string.

At execution time, the SPARQL query can reference the current instance using the variable $this. The SHACL engine will make sure that $this will walk through all instances of schema:Hotel, so you do not need to collect those values first. The SELECT query must return one row for every value of $this that is violating the constraint. If your SELECT query returns no results, then everything is OK, and no violation is produced.

You may return other values in the SELECT query, to provide more details about the properties and values involved in the constraint violation, see the SHACL spec. In our example above, we return the values of ?author so that we can inject them into the error message that is presented to the user – see sh:message.

Finally, each SPARQL constraint may have a comment explaining the purpose and implementation details of the constraint. To signal whether the system should produce errors, warnings or info messages, set sh:severity at the surrounding shape.