Querying RDF Graphs with GraphQL
This document is part of the TopQuadrant GraphQL Technology Pages
Introduction
TopBraid takes SHACL shape definitions or GraphQL schemas as input and generates an enhanced GraphQL schema that provide numerous features to query data stored in an RDF dataset. In this document we take the viewpoint of a typical GraphQL user, such as a UI developer or data analyst, to explain which features are available. Our starting point is a GraphQL schema:
Example GraphQL Schema
type Human {
id: ID!
name: String!
height: Float
friends: [Human]
}
The processor will internally convert it to SHACL and then generate the following GraphQL schema:
Example GraphQL Schema
schema {
query: RootRDFQuery
}
type RootRDFQuery {
humans (... filters etc, see later...): [Human]
... generated fields for aggregations and introspection ...
}
type Human {
uri: ID!
label: String!
id (... filters etc...): ID!
name (... filters etc...): String!
height (... filters etc...): Float
friends (... filters etc...): [Human]
... generated fields for aggregations, derived values ...
}
As shown above, the system automatically produces a root query object that has fields for every public shape, with a name that is basically the plural form of the shape name. These root query fields can take a large number of arguments to select which of the matching objects shall be returned, but we get to that later.
Each object type has a field uri
storing the URI of the underlying RDF resource and label
to query a human-readable display label.
Completing this introductory example, here is an example GraphQL query against this schema, returning all humans where the name starts with L, and all their friends, translating the height from meters to feet.
Example GraphQL Query
{
humans (where: {name: {contains: "L"}}, orderBy: name) {
id
name
height (transform: "$height / 0.3048")
friends {
id
name
}
}
}
A possible result JSON would be:
Example JSON Result
{
"data": {
"humans": [
{
"id": "1003",
"name": "Leia Organa",
"height": 4.921259842519685,
"friends": [
{
"id": "1002",
"name": "Han Solo"
},
{
"id": "1000",
"name": "Luke Skywalker"
}
]
},
...
]
}
}
Filtering
Most fields that are generated by the engine can take arguments to filter out which values to return. The various kinds of filters are described in the next sub-sections.
Filtering by Property Values
It is a common design pattern in GraphQL to allow filtering by direct value matches, e.g. give me all humans that appeared in “JEDI”:
Example GraphQL Query
{
humans(appearsIn: "JEDI") {
name
}
}
The processor produces one such argument for each field that is derived from the property shapes. This is done for the top-level query fields and the object-valued fields of each generated object type. The values of these arguments must be scalar JSON values. In order to match an object-valued property, use the URIs of the values as ID strings.
Use uri: "..."
to only return exactly the object with the given URI.
Filtering by Constraints
The where
argument is an expressive way to filter values based on constraints similar to SHACL. It is available for all object-valued declared fields including the root query fields. The values of where
are input objects with internal names such as Human_where
and fields for each declared field of the type that is constrained.
This is best explained by means of an example. The following query returns all humans where at least one starship exists that has a length greater than or equal to 30 units:
Example GraphQL Query
{
humans(where: {
starships: {
exists: {
length: {
minInclusive: 30
}
}
}
}) {
name
height
homePlanet
starships {
name
length
}
}
}
Parameter | Type | Condition |
---|---|---|
hasValue |
Same as field | Object must have exactly the given value (plus maybe others) |
minCount |
Int |
Object must have at least minCount values |
maxCount |
Int |
Object must have at most maxCount values |
minExclusive |
Same as field | Object must have a value so that value > minExclusive |
minInclusive |
Same as field | Object must have a value so that value >= minInclusive |
maxInclusive |
Same as field | Object must have a value so that value <= maxInclusive |
maxExclusive |
Same as field | Object must have a value so that value < maxExclusive |
|
String |
Object must have a value that has the given sub-string |
|
String |
Object must be a string starting with the given sub-string |
flags |
String |
Optional flags for the pattern regex engine, such as “i” to ignore case |
exists |
Nested object | One of the values must conform to all nested constraints |
Filtering by Text
Object-valued fields can use the argument queryText
to allow free-text search across the datatype values of any declared field of the object type. The values of queryText
are interpreted as regular expressions based on SPARQL’s regex
operator and ignoring case.
The following example produces all humans where any scalar field (such as name
, height
or homePlanet
) has a value starting with L:
Example GraphQL Query
{
humans (queryText: "^l") {
name
}
}
Filtering by SPARQL Expressions
As the ultimate fallback with a maximum of expressivity, any field can take the argument filter
, the values of which must be valid SPARQL FILTER expressions. When these SPARQL expressions are evaluated, certain variables have pre-defined values. The variables $label
and $uri
have the corresponding field values, and the special variable $this
refers to the current query object (RDF resource). Furthermore, for each declared field of the object type that is the value type of the queried field, a corresponding variable will hold the value of the object that is matched. For example, the following query retrieves all humans that are at least 1.6 units high:
Example GraphQL Query
{
humans (filter: "$height > 1.6") {
name
}
}
sh:maxCount 1
) because the system would otherwise need to pick a “random” value from the underlying database. In order to query values of multi-valued properties, use SPARQL EXISTS
or NOT EXISTS
expressions.Aggregations
As of TopBraid 6.1 this feature is deactivated by default (to reduce the complexity of the generated GraphQL schema). If needed, enable it using the Server Configuration Parameters page, or use Property Value Rules to expose selected kinds of aggregations.
Most database query languages support some form of aggregations, typically including COUNT
, SUM
, MIN
, MAX
, AVG
and MEDIAN
. Another form of aggregations is to build strings by concatenating multiple values. In this GraphQL implementation, only COUNT
and CONCAT
are supported for now. (Future versions may support SUM, MIN, MAX, AVG and MEDIAN if there is demand – let us know). Aggregation fields are available for multi-valued fields only.
COUNT
Each multi-valued field is accompanied by a field named xyz_COUNT
, producing an Int
result for the number of values. COUNT fields can take the same filter arguments as the other fields, making it possible to count only certain values.
The following query returns the number of humans that have at least 4 friends:
Example GraphQL Query
{
humans_COUNT (where: {friends: {minCount: 4}})
}
CONCAT
Each multi-valued field is accompanied by a field named xyz_CONCAT
, producing a String
result by concatenating all matching values of the field. An empty string is delivered if there are no matching values. _CONCAT
fields take an optional argument separator
if the string should use something else than the default ", "
between sub-strings.
The following query returns, for each human, the name and a single string consisting of all friends starting with L, separated by " and "
:
Example GraphQL Query
{
humans {
label
friends_CONCAT(where:{name:{contains:"L"}}, separator: " and ", orderBy: label)
}
}
An example response is:
Example GraphQL Query
{
"data": {
"humans": [
{
"label": "Character-1000",
"friends_CONCAT": "Character-1003"
},
{
"label": "Character-1002",
"friends_CONCAT": "Character-1000 and Character-1003"
},
{
"label": "Character-1003",
"friends_CONCAT": "Character-1000"
}
]
}
}
The _CONCAT
fields also support the orderBy
arguments.
If the concatenated values are objects, then each object will be represented by its label
by default. To construct strings from other values, use _CONCAT
with the argument labelExpr
which takes a SPARQL expression string as its value. In this expression you can access the same pre-bound variables as for filter
expressions.
The following query would return a concatenation of the lower-case names of all friends.
Example GraphQL Query
{
humans {
label
friends_CONCAT (labelExpr: "LCASE($name)", orderBy: name)
}
}
The labelExpr
argument can also be used for scalar fields, in which case the variable with the name of the underlying field is holding the literal value.
Ordering and Paging
By default, RDF triples are unsorted. This section explains how a certain order can be accomplished, and how to page through large numbers of items.
orderBy
, orderByDesc
and orderByExpr
orderBy
to specify the order of values in the JSON array. The values of orderBy
are the names of the fields (from an enumeration) of the object type, including label
and uri
. The values will be sorted in ascending order unless the argument orderByDesc
is set to true
, as shown in the following example.
Example GraphQL Query
{
humans (orderBy: height, orderByDesc: true) {
name
height
}
}
The optional argument orderByExpr
can take a SPARQL expression as its value. If present, then each value will be run through the expression before being compared. In those expressions, the variable $value
can be used to access the current value. For example, use LCASE($value)
to order all values in lower-case form.
first
, skip
and orderAll
If results are returned in order, then the arguments first
and skip
can be used to page through results. Both are taking integers. first
states the maximum number of results that shall be produced and skip
is the offset (starting at 0).
The following example returns the 3rd page of 10 humans, ordered by names.
Example GraphQL Query
{
humans(orderBy: name, first: 10, skip: 20) {
name
height
}
}
If orderBy
is combined with first
, the engine will by default first collect a random set of “first” values and sort those. However, if you also specify orderAll: true
then the engine will first walk through all available values and apply the sorting there, and then return the first values of that wholly sorted list. These are more reliable results. By leaving out orderAll
you can ensure that the system is not running into worst case performance as sorting many thousands of values might become quite slow, and often it is enough to merely show any reasonable subset.
Working with rdf:Lists
This feature requires TopBraid 6.2 onwards.
By default, rdf:Lists are treated like any other RDF value, and the recursive structure of list nodes has no special support by GraphQL. If you want rdf:Lists to be handled like JSON arrays (multi-valued GraphQL fields) then you need to make sure that TopBraid can understand the intention and knows what type the list member have. The supported design pattern is described in this blog post. In particular, the sh:node
of your property shape needs to be dash:ListShape
:
Example RDF/SHACL
ex:PlayList
a sh:NodeShape, rdfs:Class ;
sh:property [
sh:path ex:songs ;
sh:maxCount 1 ;
sh:node dash:ListShape ;
sh:property [
sh:path ( [ sh:zeroOrMorePath rdf:rest ] rdf:first ) ;
sh:class ex:Song ;
]
] .
Note that the sh:maxCount 1
is recommended to clarify that there can only be one list per subject. Cases with multiple rdf:Lists
are not supported.
Working with Reification
This feature requires TopBraid 6.4 onwards.
Properties that have been reified using dash:reifiableBy will have an additional GraphQL field consisting of the property field name and “Reif”. For example, you can query the time stamp of each value of the parent property for a person using:
Example GraphQL Query
{
persons {
parent(orderBy: label) { label }
parentReif(orderBy: label) { timeStamp }
}
}
Each field with a scalar value type can take an argument
transform
that takes a SPARQL expression string as its value. If a transform
is present, the values delivered for the field are passed into the expression (as a variable with the same name as the field), and the result of the evaluation will be returned instead of the original value. In these expressions, the values of any other single-valued field can also be accessed as named variables, as can $label
and $uri
, as well as $this
to access the surrounding resource.
In the following example, the height of each human is converted from meters to feet, and the name is returned in all upper-case letters.
Example GraphQL Query
{
humans {
name (transform: "UCASE($name)")
height (transform: "$height * 3.28084")
}
}
As of TopBraid 6.1 this feature is deactivated by default (to reduce the complexity of the generated GraphQL schema and make it more difficult to write inefficient queries). If needed, enable it using the Server Configuration Parameters page, or use Property Value Rules to define the kinds of inferences that are supported for a schema.
While transformations can modify how existing values are returned, it is sometimes useful to compute arbitrary field values dynamically. Each object type has the following generated fields in addition to the declared fields: deriveBoolean
, deriveFloat
, deriveInt
and deriveString
. The value types of these are single values of the corresponding scalar types. They take a single required argument expr
of type String
that must be a valid SPARQL expression. In these expressions, the values of other properties of the underlying RDF resource can be queried as pre-bound variables. Furthermore, the variables $this
, $label
and $uri
are pre-bound. The result of the expression will be returned in the specified scalar type.
It is common to use GraphQL aliases so that the names in the generated JSON will not start with “derive”. The following example delivers the length of each human’s name as a separate integer field:
Example GraphQL Query
}
humans {
name
nameLength: deriveInt(expr: "STRLEN($name)")
}
}
Producing for example:
Example JSON result
}
"data": {
"humans": [
{
"name": "Luke Skywalker",
"nameLength": 14
},
{
"name": "Han Solo",
"nameLength": 8
},
{
"name": "Leia Organa",
"nameLength": 11
}
]
}
}
In addition to these single-valued deriveXY
fields, there is also the special field deriveStrings
that can be used to generate a list of String values. This takes a single required argument query
that must be a SPARQL SELECT query producing a single result variable. The query will be executed with the variable $this
pre-bound to the current context resource.
Example GraphQL Query
{
humans {
name
friendNames: deriveStrings (query: "SELECT ?name { $this starwars:friends/starwars:name ?name } ORDER BY ?name")
}
}
Producing for example:
Example JSON result
{
"data": {
"humans": [
{
"name": "Luke Skywalker",
"friendNames": [
"C-3PO",
"Han Solo",
"Leia Organa",
"R2-D2"
]
},
...
]
}
}
Note that the June 2018 version of the GraphQL spec introduces multi-line strings using the """
syntax, making complex SPARQL queries more readable.
Introspection
Any standards-compliant GraphQL implementation supports some built-in introspection query types that provide information about the available object and field types. However, these generic capabilities would not return the details of the richer RDF/SHACL model underneath, and would also include information about the various automatically generated objects and fields that are often not of interest to introspecting clients. The alternative introspection query types from this section are available to provide a view on the underlying SHACL structure.
The following example illustrates a typical use case, returning information that may be used to populate an input form for objects matching a given type:
Example GraphQL Query
{
_typeShapeByName(name: "Human") {
groups {
label
fields {
label
}
}
}
}
Here is the shape introspection schema:
Example GraphQL Schema
type RootRDFQuery {
...
_typeShapeByName(name: String!): _TypeShape
_typeShapeByURI(uri: String!): _TypeShape
_typeShapesForResource(uri: String!): [_TypeShape]
}
# Metadata about a type as derived from its shape(s).
type _TypeShape {
uri: String!
name: String!
label: String!
rootQueryField: String
fields: [_FieldShape]
fieldByName (name: String!): _FieldShape
groups: [_FieldGroup]
}
# Metadata about a field group.
type _FieldGroup {
uri: String!
label: String!
order: Float!
fields: [_FieldShape]
}
# Metadata about a field as derived from its shape(s).
type _FieldShape {
name: String!
label: String!
datatype: _Resource
externalType: _Resource
typeShape: _TypeShape
scalar: Boolean!
unionTypeNames: [String]
minCount: Int!
maxCount: Int
min: String
max: String
isMinExclusive: Boolean!
isMaxExclusive: Boolean!
minLength: Int!
maxLength: Int
pattern: String
group: _FieldGroup
order: Float!
}
This information can be combined with follow-up queries using the standard introspection schema, for example to find the enumerated values of an enum.
Note that the special cases of xsd:string or rdf:langString
and its inverse variation rdf:langString or xsd:string
(using sh:or
) are mapped to rdfs:Literal
in the _FieldShape.datatype
field.