SHACL and the GraphQL Schema

This section (for maintainers of Ontologies and other RDF/SHACL data models) explains how RDF graphs can be published through the GraphQL services of TopBraid. In a nutshell, one or more GraphQL schemas are automatically generated using data shape definitions in the Shapes Constraint Language (SHACL). These SHACL shapes may be automatically generated using other input GraphQL schemas, enhancing them in the process with numerous features to query data stored in an RDF dataset. SHACL data shapes can also be generated from other input formats supported by TopQuarant’s products.

Note

The GraphQL schema also determines which assets are searchable from the Search Panel. Users can only search for assets and nested assets if they are reachable through GraphQL. Therefore, this section is also relevant if you want to customize the Search Panel.

The readers of this section are expected to be familiar with GraphQL and have basic RDF skills. Decent knowledge of SHACL is advantageous.

This section uses the prefix dash which represents the namespace http://datashapes.org/dash#. The prefix graphql represents the namespace http://datashapes.org/graphql#. Both graphs are automatically included into every TopBraid EDG Ontology.

Selecting the Shapes

An RDF graph may contain thousands of classes or data shapes. A GraphQL service that includes all of them at once would quickly become unusable. For now we will focus on asset collections of type Data Graph. In order to instruct the processor on which shapes and classes shall be exposed via GraphQL for Data Graphs, the starting point is the “Home” asset of an Ontology that is included into the Data Graph. In such an Ontology, use the drop down box in the upper right corner of the form to switch to the Graph Schema view. The Home asset must use the following properties to include or exclude shapes:

Screenshot of TopBraid EDG Ontology editor defining the GraphQL schema properties — **Use the GraphQL Schema view of the Ontology’s Home asset to edit what gets exposed by GraphQL**

GraphQL Schema generation properties
Property	Description
graphql:publicShape	The values are included into the GraphQL schema
graphql:publicClass	The values and all its subclasses are included
graphql:publicNamespace	All shapes from the given namespace are included
graphql:protectedShape	The values are included but not available from the root query
graphql:protectedClass	The values and all its subclasses are included but not available from the root query
graphql:privateShape	The values are excluded from the GraphQL schema (even if published by other properties)

The algorithm that produces the set of published shapes first collects all shapes or classes defined using the graphql:publicXY and graphql:protectedXY properties above from the schema and also all its transitive values of owl:imports and rdf:type properties. Then it removes those that are marked via graphql:privateShape.

All published shapes can be queried via GraphQL and are automatically exposed by the root query object. Those that are marked protected can not be queried from the root query but can be reached and traversed from other object types (and thus the Search Panel). Here is an example:

ex:MySchema
    a graphql:Schema ;
    graphql:publicShape ex:Human .

ex:Human
    a sh:NodeShape ;
    sh:property [
        sh:path ex:id ;
        sh:datatype xsd:string ;
        sh:minCount 1 ;
        sh:maxCount 1 ;
        sh:order "0"^^xsd:decimal ;
        graphql:isIDField true ;
    ] ;
    sh:property [
        sh:path ex:name ;
        sh:datatype xsd:string ;
        sh:minCount 1 ;
        sh:maxCount 1 ;
        sh:order "1"^^xsd:decimal ;
    ] ;
    sh:property [
        sh:path ex:height ;
        sh:datatype xsd:decimal ;
        sh:maxCount 1 ;
        sh:order "2"^^xsd:decimal ;
    ] ;
    sh:property [
        sh:path ex:friends ;
        sh:node ex:Human ;
        sh:order "3"^^xsd:decimal ;
    ].

From these SHACL shapes, the processor will internally generate the following GraphQL schema:

schema {
	query: RootRDFQuery
}

type RootRDFQuery {
	humans (... filters etc, see later...): [Human]
	... generated fields for aggregations and introspection ...
}

type Human {
	uri: ID!
	label: String!
	id (... filters etc...): ID!
	name (... filters etc...): String!
	height (... filters etc...): Float
	friends (... filters etc...): [Human]
	... generated fields for aggregations, derived values ...
}

As shown above, the system automatically produces a root query object that has fields for every public shape, with a name that is basically the plural form of the shape name. These root query fields can take a large number of arguments to select which of the matching objects shall be returned, see GraphQL Queries.

Completing this introductory example, here is an example GraphQL query against this schema, returning all humans where the name starts with L, and all their friends, translating the height from meters to feet.

{
    humans (where: {name:{pattern:"^L"}}, orderBy: name) {
        id
        name
        height (transform: "$height / 0.3048")
        friends {
            id
            name
        }
    }
}

A possible result JSON would be:

{
	"data": {
		"humans": [
			{
				"id": "1003",
				"name": "Leia Organa",
				"height": 4.921259842519685,
				"friends": [
					{
						"id": "1002",
						"name": "Han Solo"
					},
					{
						"id": "1000",
						"name": "Luke Skywalker"
					}
				]
			}
		]
	}
}

For other asset collection types beside Data Graphs, the situation is slightly different. Those types already have a pre-defined list of public classes that become queryable through GraphQL and the Search Panel. For example, Taxonomies declare the classes skos:Concept, skos:ConceptScheme and skosxl:Label as public. This information is attached to the dedicated schema class taxonomies:Schema.

For other asset collection types, there are similar schema classes such as edg:GlossaryProject. These are generally the rdf:type of the home resource of the Home resource.

You can use the properties mentioned above such as graphql:publicClass at those schema resources. For example, in order to add a class ex:Person to the GraphQL schema of your Taxonomy, add the triple taxonomies:Taxonomy graphql:publicClass ex:Person to a suitable Ontology.

Objects and Fields

For each published node shape in a schema, the processor will create one GraphQL object type as described in the following sections.

Object Types for Node Shapes

The name of this object type will be derived using the following rules (in order):

Use the value of graphql:name of the shape.
Use the local name (i.e. the part of the shape URI after a separator such as ‘/’ or ‘#’), replacing ‘-’ with ‘_’, if that is a valid GraphQL name.

If there is more than one object type with the same name (e.g. from different namespaces but with the same local name), then preprend the prefix of the namespace and ‘_’. For example, ex:Human would become ex_Human.

In general, the mapping is rather strict if the underlying shape definitions are invalid. For example if no valid name can be produced for a shape then the schema is rejected and the user encouraged to add suitable graphql:name triples.

The uri Field

Each generated object type has a built-in field called uri that can be used to retrieve the URI of the RDF resource. For blank nodes this is an internal identifier starting with _:. In general, these blank node identifiers can be used interchangeably with URIs.

The label Field

Each generated object type has a built-in field called label that can be used to retrieve a human-readable label for an object. This label is typically derived from the rdfs:label (or a similar property) and should use the preferred language of the client, if multi-lingual labels exist. The label field always returns something, falling back to the local name of the underlying RDF resource, or an internal identifier starting with _: for blank nodes.

Fields for Property Shapes

The object types produced from a node shape will have one field for each distinct sh:path that is defined at any property shape of the node shape. If the node shape is also an rdfs:Class then this includes any property shape of the (transitive) superclasses. Furthermore, any property shapes attached to values of sh:node of the node shape will (recursively) be included. (As a general pattern, rdfs:subClassOf and sh:node are treated uniformly, i.e. sh:node is an extension and inheritance mechanism similar to subclassing.)

The names of these fields are derived using the same rules as for object types, i.e. checking graphql:name first, then local names of the sh:path (if that’s a URI), and prepend a prefix if duplicate names would exist. Note that if a property shape is about a complex SHACL path, then a graphql:name is strongly recommended.

The type of these generated fields is derived from the sh:datatype, sh:node or sh:class. For example, sh:datatype xsd:boolean gets mapped to Boolean and sh:datatype xsd:decimal to Float. To produce ID, annotate the property shape with graphql:isIDField true combined with sh:datatype xsd:string.

If the property shape defines an sh:or list with at least one member, and all members of that list are node shapes with a URI, then a union type will be generated automatically. If the property shape has an sh:or list that is either xsd:string or rdf:langString or its inverse variation rdf:langString or xsd:string then the object type LangString (with fields lang and string) will be used. For other sh:or lists where the first entry is a sh:datatype shape, that specified datatype will be used.

For object-valued properties for which there is no matching GraphQL object type, the system falls back to a built-in special type _Resource that only offers uri and label fields. This type is for example used for links that typically go outside of the published schema, e.g. rdf:type values.

Fields are list-typed unless there is a property shape with sh:maxCount 1. Fields are marked as non-nullable (with !) if there is a sh:minCount 1.

Note that, in general, any property shape that is marked as sh:deactivated true is ignored by the processor.

Defining multiple GraphQL Schemas

A dataset may contain many named graphs and heterogeneous data. It is possible to define multiple GraphQL schemas for the same data, and in the same shapes graph. Each instance of graphql:Schema can either be identified by its URI or by its graphql:name. This is best explained through an example.

ex:MySchema
	a graphql:Schema ;
	graphql:name "starwars" ;
	graphql:publicShape ex:Human .

The above schema would be available through the URL schema [server]/graphql/[dataset]/starwars.

If a schema does not carry a graphql:name then it can be accessed via the qname of its URI, replacing : with _: [server]/graphql/[dataset]/ex_MySchema would also work.