Querying RDF Graphs with GraphQL

This document is part of the TopQuadrant GraphQL Technology Pages

This document describes the features of the TopBraid GraphQL services. TopBraid products automatically generate GraphQL schemas from existing SHACL shape definitions, as explained in Publishing RDF/SHACL Graphs as GraphQL. The generated GraphQL schemas include built-in facilities to filter data using direct value matching, complex query patterns and even SPARQL expressions. Furthermore, the schemas define aggregation, ordering and paging of results, as well as dynamically deriving values for the JSON response. Using the schemas, implementations can convert RDF graph data into JSON object structures (and back).

Introduction

TopBraid takes SHACL shape definitions or GraphQL schemas as input and generates an enhanced GraphQL schema that provide numerous features to query data stored in an RDF dataset. In this document we take the viewpoint of a typical GraphQL user, such as a UI developer or data analyst, to explain which features are available. Our starting point is a GraphQL schema:

type Human {
	id: ID!
	name: String!
	height: Float
	friends: [Human]
}

The processor will internally convert it to SHACL and then generate the following GraphQL schema:

schema {
	query: RootRDFQuery
}

type RootRDFQuery {
	humans (... filters etc, see later...): [Human]
	... generated fields for aggregations and introspection ...
}

type Human {
	uri: ID!
	label: String!
	id (... filters etc...): ID!
	name (... filters etc...): String!
	height (... filters etc...): Float
	friends (... filters etc...): [Human]
	... generated fields for aggregations, derived values ...
}

As shown above, the system automatically produces a root query object that has fields for every public shape, with a name that is basically the plural form of the shape name. These root query fields can take a large number of arguments to select which of the matching objects shall be returned, but we get to that later.

Each object type has a field uri storing the URI of the underlying RDF resource and label to query a human-readable display label.

Completing this introductory example, here is an example GraphQL query against this schema, returning all humans where the name starts with L, and all their friends, translating the height from meters to feet.

{
	humans (where: {name:{pattern:"^L"}}, orderBy: name) {
		id
    	name
		height (transform: "$height / 0.3048")
		friends {
			id
			name
		}
	}
}

A possible result JSON would be:

{
	"data": {
		"humans": [
			{
				"id": "1003",
				"name": "Leia Organa",
				"height": 4.921259842519685,
				"friends": [
					{
						"id": "1002",
						"name": "Han Solo"
					},
					{
						"id": "1000",
						"name": "Luke Skywalker"
					}
				]
			},
			...
		]
	}
}

Filtering

Most fields that are generated by the engine can take arguments to filter out which values to return. The various kinds of filters are described in the next sub-sections.

Filtering by Property Values

It is a common design pattern in GraphQL to allow filtering by direct value matches, e.g. give me all humans that appeared in "JEDI":

{
	humans(appearsIn: "JEDI") {
    	name
	}
}

The processor produces one such argument for each field that is derived from the property shapes. This is done for the top-level query fields and the object-valued fields of each generated object type. The values of these arguments must be scalar JSON values. In order to match an object-valued property, use the URIs of the values as ID strings.

Use uri: "..." to only return exactly the object with the given URI.

Filtering by Constraints

The where argument is an expressive way to filter values based on constraints similar to SHACL. It is available for all object-valued declared fields including the root query fields. The values of where are input objects with internal names such as Human_where and fields for each declared field of the type that is constrained.

This is best explained by means of an example. The following query returns all humans where at least one starship exists that has a length greater than or equal to 30 units:

{
	humans(where: {
			starships: {
				exists: {
					length: {
						minInclusive: 30
					}
				}
			}
		}) {
		name
		height
		homePlanet
		starships {
			name
			length
		}
	}
}

The following types of constraints are (currently) supported.

Parameter	Type	Condition
`hasValue`	Same as field	Object must have exactly the given value (plus maybe others)
`minCount`	`Int`	Object must have at least `minCount` values
`maxCount`	`Int`	Object must have at most `maxCount` values
`minExclusive`	Same as field	Object must have a value so that `value > minExclusive`
`minInclusive`	Same as field	Object must have a value so that `value >= minInclusive`
`maxInclusive`	Same as field	Object must have a value so that `value <= maxInclusive`
`maxExclusive`	Same as field	Object must have a value so that `value < maxExclusive`
`pattern`	`String`	Object must have a value that matches the given regular expression
`flags`	`String`	Optional flags for the `pattern` regex engine, such as "i" to ignore case
`exists`	Nested object	One of the values must conform to all nested constraints

Filtering by Text

Object-valued fields can use the argument queryText to allow free-text search across the datatype values of any declared field of the object type. The values of queryText are interpreted as regular expressions based on SPARQL's regex operator and ignoring case.

The following example produces all humans where any scalar field (such as name, height or homePlanet) has a value starting with L:

{
	humans (queryText: "^l") {
		name
	}
}

Filtering by SPARQL Expressions

As the ultimate fallback with a maximum of expressivity, any field can take the argument filter, the values of which must be valid SPARQL FILTER expressions. When these SPARQL expressions are evaluated, certain variables have pre-defined values. The variables $label and $uri have the corresponding field values, and the special variable $this refers to the current query object (RDF resource). Furthermore, for each declared field of the object type that is the value type of the queried field, a corresponding variable will hold the value of the object that is matched. For example, the following query retrieves all humans that are at least 1.6 units high:

{
	humans (filter: "$height > 1.6") {
		name
	}
}

Note that these pre-bound values are only supported for single-valued properties (with sh:maxCount 1) because the system would otherwise need to pick a "random" value from the underlying database. In order to query values of multi-valued properties, use SPARQL EXISTS or NOT EXISTS expressions.

Aggregations

As of TopBraid 6.1 this feature is deactivated by default (to reduce the complexity of the generated GraphQL schema). If needed, enable it using the Server Configuration Parameters page, or use Property Value Rules to expose selected kinds of aggregations.

Most database query languages support some form of aggregations, typically including COUNT, SUM, MIN, MAX, AVG and MEDIAN. Another form of aggregations is to build strings by concatenating multiple values. In this GraphQL implementation, only COUNT and CONCAT are supported for now. (Future versions may support SUM, MIN, MAX, AVG and MEDIAN if there is demand - let us know). Aggregation fields are available for multi-valued fields only.

COUNT

Each multi-valued field is accompanied by a field named xyz_COUNT, producing an Int result for the number of values. COUNT fields can take the same filter arguments as the other fields, making it possible to count only certain values.

The following query returns the number of humans that have at least 4 friends:

{
	humans_COUNT (where: {friends: {minCount: 4}})
}

CONCAT

Each multi-valued field is accompanied by a field named xyz_CONCAT, producing a String result by concatenating all matching values of the field. An empty string is delivered if there are no matching values. _CONCAT fields take an optional argument separator if the string should use something else than the default ", " between sub-strings.

The following query returns, for each human, the name and a single string consisting of all friends starting with L, separated by " and ":

{
	humans {
		label
		friends_CONCAT(where:{name:{pattern:"^L"}}, separator: " and ", orderBy: label)
	}
}

An example response is:

{
	"data": {
		"humans": [
			{
				"label": "Character-1000",
				"friends_CONCAT": "Character-1003"
			},
			{
				"label": "Character-1002",
				"friends_CONCAT": "Character-1000 and Character-1003"
			},
			{
				"label": "Character-1003",
				"friends_CONCAT": "Character-1000"
			}
		]
	}
}

The _CONCAT fields also support the orderBy arguments.

If the concatenated values are objects, then each object will be represented by its label by default. To construct strings from other values, use _CONCAT with the argument labelExpr which takes a SPARQL expression string as its value. In this expression you can access the same pre-bound variables as for filter expressions.

The following query would return a concatenation of the lower-case names of all friends.

{
	humans {
		label
		friends_CONCAT (labelExpr: "LCASE($name)", orderBy: name)
	}
}

The labelExpr argument can also be used for scalar fields, in which case the variable with the name of the underlying field is holding the literal value.

Ordering and Paging

By default, RDF triples are unsorted. This section explains how a certain order can be accomplished, and how to page through large numbers of items.

`orderBy`, `orderByDesc` and `orderByExpr`

Any multi-valued field can take the argument orderBy to specify the order of values in the JSON array. The values of orderBy are the names of the fields (from an enumeration) of the object type, including label and uri. The values will be sorted in ascending order unless the argument orderByDesc is set to true, as shown in the following example.

{
	humans (orderBy: height, orderByDesc: true) {
		name
		height
	}
}

The optional argument orderByExpr can take a SPARQL expression as its value. If present, then each value will be run through the expression before being compared. In those expressions, the variable $value can be used to access the current value. For example, use LCASE($value) to order all values in lower-case form.

`first`, `skip` and `orderAll`

If results are returned in order, then the arguments first and skip can be used to page through results. Both are taking integers. first states the maximum number of results that shall be produced and skip is the offset (starting at 0).

The following example returns the 3rd page of 10 humans, ordered by names.

{
	humans(orderBy: name, first: 10, skip: 20) {
		name
		height
	}
}

If orderBy is combined with first, the engine will by default first collect a random set of "first" values and sort those. However, if you also specify orderAll: true then the engine will first walk through all available values and apply the sorting there, and then return the first values of that wholly sorted list. These are more reliable results. By leaving out orderAll you can ensure that the system is not running into worst case performance as sorting many thousands of values might become quite slow, and often it is enough to merely show any reasonable subset.

Working with rdf:Lists

This feature requires TopBraid 6.2 onwards.

By default, rdf:Lists are treated like any other RDF value, and the recursive structure of list nodes has no special support by GraphQL. If you want rdf:Lists to be handled like JSON arrays (multi-valued GraphQL fields) then you need to make sure that TopBraid can understand the intention and knows what type the list member have. The supported design pattern is described in this blog post. In particular, the sh:node of your property shape needs to be dash:ListShape:

ex:PlayList
    a sh:NodeShape, rdfs:Class ;
    sh:property [
        sh:path ex:songs ;
        sh:maxCount 1 ;
        sh:node dash:ListShape ;
        sh:property [
            sh:path ( [ sh:zeroOrMorePath rdf:rest ] rdf:first ) ;
            sh:class ex:Song ;
        ]
    ] .

Note that the sh:maxCount 1 is recommended to clarify that there can only be one list per subject. Cases with multiple rdf:Lists are not supported.

Transformations

Each field with a scalar value type can take an argument transform that takes a SPARQL expression string as its value. If a transform is present, the values delivered for the field are passed into the expression (as a variable with the same name as the field), and the result of the evaluation will be returned instead of the original value. In these expressions, the values of any other single-valued field can also be accessed as named variables, as can $label and $uri, as well as $this to access the surrounding resource.

In the following example, the height of each human is converted from meters to feet, and the name is returned in all upper-case letters.

{
	humans {
		name (transform: "UCASE($name)")
		height (transform: "$height * 3.28084")
	}
}

Derived Fields

As of TopBraid 6.1 this feature is deactivated by default (to reduce the complexity of the generated GraphQL schema and make it more difficult to write inefficient queries). If needed, enable it using the Server Configuration Parameters page, or use Property Value Rules to define the kinds of inferences that are supported for a schema.

While transformations can modify how existing values are returned, it is sometimes useful to compute arbitrary field values dynamically. Each object type has the following generated fields in addition to the declared fields: deriveBoolean, deriveFloat, deriveInt and deriveString. The value types of these are single values of the corresponding scalar types. They take a single required argument expr of type String that must be a valid SPARQL expression. In these expressions, the values of other properties of the underlying RDF resource can be queried as pre-bound variables. Furthermore, the variables $this, $label and $uri are pre-bound. The result of the expression will be returned in the specified scalar type.

It is common to use GraphQL aliases so that the names in the generated JSON will not start with "derive". The following example delivers the length of each human's name as a separate integer field:

{
	humans {
		name
		nameLength: deriveInt(expr: "STRLEN($name)")
	}
}

Producing for example:

{
	"data": {
		"humans": [
			{
				"name": "Luke Skywalker",
				"nameLength": 14
			},
			{
				"name": "Han Solo",
				"nameLength": 8
			},
			{
				"name": "Leia Organa",
				"nameLength": 11
			}
		]
	}
}

In addition to these single-valued deriveXY fields, there is also the special field deriveStrings that can be used to generate a list of String values. This takes a single required argument query that must be a SPARQL SELECT query producing a single result variable. The query will be executed with the variable $this pre-bound to the current context resource.

{
	humans {
		name
		friendNames: deriveStrings (query: "SELECT ?name { $this starwars:friends/starwars:name ?name } ORDER BY ?name")
	}
}

Producing for example:

{
	"data": {
    	"humans": [
			{
				"name": "Luke Skywalker",
				"friendNames": [
					"C-3PO",
					"Han Solo",
					"Leia Organa",
					"R2-D2"
				]
			},
			...
		]
	}
}

Note that the June 2018 version of the GraphQL spec introduces multi-line strings using the """ syntax, making complex SPARQL queries more readable.

Introspection

Any standards-compliant GraphQL implementation supports some built-in introspection query types that provide information about the available object and field types. However, these generic capabilities would not return the details of the richer RDF/SHACL model underneath, and would also include information about the various automatically generated objects and fields that are often not of interest to introspecting clients. The alternative introspection query types from this section are available to provide a view on the underlying SHACL structure.

The following example illustrates a typical use case, returning information that may be used to populate an input form for objects matching a given type:

{
	_typeShapeByName(name: "Human") {
		groups {
			label
			fields {
				label
			}
		}
	}
}

Here is the shape introspection schema:

type RootRDFQuery {
	...
	_typeShapeByName(name: String!): _TypeShape
	_typeShapeByURI(uri: String!): _TypeShape
	_typeShapesForResource(uri: String!): [_TypeShape]
}

# Metadata about a type as derived from its shape(s).
type _TypeShape {
	uri: String!
	name: String!
	label: String!
	rootQueryField: String
	fields: [_FieldShape]
	fieldByName (name: String!): _FieldShape
	groups: [_FieldGroup]
}

# Metadata about a field group.
type _FieldGroup {
	uri: String!
	label: String!
	order: Float!
	fields: [_FieldShape]
}

# Metadata about a field as derived from its shape(s).
type _FieldShape {
	name: String!
	label: String!
	datatype: _Resource
	externalType: _Resource
	typeShape: _TypeShape
	scalar: Boolean!
	unionTypeNames: [String]
	minCount: Int!
	maxCount: Int
	min: String
	max: String
	isMinExclusive: Boolean!
	isMaxExclusive: Boolean!
	minLength: Int!
	maxLength: Int
	pattern: String
	group: _FieldGroup
	order: Float!
}

This information can be combined with follow-up queries using the standard introspection schema, for example to find the enumerated values of an enum.

Note that the special cases of xsd:string or rdf:langString and its inverse variation rdf:langString or xsd:string (using sh:or) are mapped to rdfs:Literal in the _FieldShape.datatype field.