GraphQL Data Shapes Directives

This document is part of the TopQuadrant GraphQL Technology Pages

This document defines an easy-to-use set of GraphQL directives that can significantly improve the value of GraphQL schemas for JSON-based data processing. The @uri directive can be used to state how globally unique identifiers can be produced for JSON objects, based on field values. The @class directive defines subclass relationships between GraphQL types, going beyond the limited inheritance mechanism of the current GraphQL spec. The @shape directive defines semantic constraints that all JSON objects of the GraphQL type are expected to conform to. The @display directive specifies user interface metadata for form-building including default values.

Unique Identifiers for GraphQL

JSON objects delivered by GraphQL services typically represent entities from an underlying database or some object repository. However, each time someone requests information about these entities, a new JSON object is produced, and there is no reliable mechanism to ensure that these objects indeed refer to the same entity.

Let's look at an example: below are the results of various GraphQL requests:

{
	"human": {
		"id": "HAN",
		"name": "Han Solo",
		"friends": [
			{ "id": "LEIA" }, 
			{ "id": "LUKE" }
		]
	}
}

{
	"human": {
		"id": "LUKE",
		"name": "Luke Skywalker", 
		"friends": [
			{ "id": "HAN" }
		]
	}
}

{
	"human": {
		"id": "LEIA",
		"name": "Leia Organa",
		"friends": null
	}
}

A client that is trying to make sense of these objects has individual, disconnected JSON tree structures to begin with:

However, both the conceptual model and the underlying database is probably rather as follows:

With a combined object structure as shown above, a client consuming the JSON objects delivered by a GraphQL service could collect data from any subsequent request and put them into the right slots. For example, if a future request delivers height of one of the humans above, a corresponding field can be added to the corresponding object. The result is a true graph structure that can hold the collective results from many individual JSON tree structures.

We have defined GraphQL directives that we are using to derive unique identifiers. In the image above, these unique identifiers are, for example, ex:human-HAN. The technology used for these identifiers is well-established from Web technologies including RDF, namely Unique Resource Identifiers.

Above, we use abbreviated URIs:

human:HAN is the abbreviation for http://example.org/human/HAN using the namespace prefix human:

The use of URIs makes it less likely that identifiers used for data from a given data source that is represented by a given GraphQL schema will clash with data from another data source. Furthermore, URIs can be cross-referenced from outside of your local data graph, leading to a potentially huge knowledge graph that can drive user applications and allow queries that go across local boundaries.

URI Templates

Many GraphQL objects include one (or more) ID fields that are used to identify an entity delivered by a server. The @uri directive uses such ID fields to derive globally unique URIs. All we need to do it to annotate the GraphQL schema as follows:

schema {
	query: Query
}
				
type Query {
	human (id: ID!): [Human]
}

type Human @uri(template: "http://example.org/human/{$id}") {
	id: ID!
	name: String!
	friends: [Human]
}

Using the URI templates, the three JSON result objects from the introduction can be turned into a data graph that collects and merges information from multiple requests:

URI from JSON Object	Field	Value
`http://example.org/human/HAN`	`id`	`"HAN"`
`http://example.org/human/HAN`	`name`	`"Han Solo"`
`http://example.org/human/HAN`	`friends`	`http://example.org/human/LEIA`
`http://example.org/human/HAN`	`friends`	`http://example.org/human/LUKE`
`http://example.org/human/LEIA`	`id`	`"LEIA"`
`http://example.org/human/LEIA`	`name`	`"Leia Organa"`
`http://example.org/human/LUKE`	`id`	`"LUKE"`
`http://example.org/human/LUKE`	`name`	`"Luke Skywalker"`
`http://example.org/human/LUKE`	`friends`	`http://example.org/human/HAN`

(Readers familiar with RDF will recognize that the above table is identical to how graph databases store information in so-called triples, using a subject, a predicate and an object.)

URI templates can use values of any single-valued field using the {$fieldName} syntax, and insert the URI-encoded string representation of the corresponding value of these fields. Values of these fields are assumed to be present in the JSON object, and are therefore typically used with non-nullable fields marked with the ! operator. If such values are absent then the URI cannot be generated, and the associated JSON object can not be added into the data graph.

URI templates may reference multiple fields, such as http://example.org/person/{$firstName}-{$lastName} although this is probably rare.

Declaring Namespace Prefixes

Namespace prefixes are used to abbreviate URIs so that they do not need to be repeated over and over again, and so that changes to URIs just need to be made in a single place. We introduce the @prefixes directive to define such namespace prefixes for a GraphQL schema:

schema
	@prefixes(
		human: "http://example.org/human/",
		starwars: "http://starwars.com/data/ (default)"
	)
{
	query: Query
}

type Human @uri(template: "human:{$id}") {
	id: ID!
...

The notion of URIs and namespaces does not need to be limited to instances: even GraphQL types themselves may have a URI so that a type can become an entity in the data graph. For this purpose, the example above marks one of the namespace prefixes as the "default". This means that all types defined in the schema will (by default) use this namespace, and Human gets the URI http://starwars.com/data/Human, abbreviated as starwars:Human.

Importing and Reusing GraphQL Schemas

We took the ideas of URIs to the next level and assigned a URI to each GraphQL schema. By doing so, we make it possible for other schemas to reference our schema, solving a well-known GraphQL problem coined Schema Duplication. Basically, if one schema defines a type Movie and you have a field that links your local Actor type with such movies, then wouldn't it be nice to just reference the existing Movie type instead of repeating it over and over? The directive @graph can help.

In the following example, the schema gets the URI http://starwars.com/data/ and it imports (or: includes) another, more general schema about movies.

schema
	@graph(
		uri: "http://starwars.com/data/",
		imports: [ "http://movies.org/data/" ]
	)
	@prefixes(
		starwars: "http://starwars.com/data/ (default)",
		movies: "http://movies.org/data/"
	)
...
type Actor {
	appearedIn: movies_Movie
	...
}

The concept of referencing URIs from external files is well-known from the RDF and Linked Data worlds, where engines may even dynamically look up other schemas from the Web, using the provided URIs. You don't need to take it that far - the URIs may just as well be aliases to local files. In any case, given the namespace prefixes shown above, a field such as Actor.appearedIn can now reference the Movie type from the namespace that has the prefix movies:. The example above shows that in order to use such prefixes in GraphQL identifiers, you need to use the underscore character (instead of the : if the reference is a string).

Organizing GraphQL schemas into multiple files is key to realizing modularity and reuse potential. However, since this mechanism is not part of the standard GraphQL spec, we had to develop a pre-processor that combines the various files into one.

Now that types can have unique identifiers, the ability to reference GraphQL types from other schemas opens up some interesting possibilities as shown in the next section.

Classes and Inheritance for GraphQL

Many data models that are accessible via GraphQL APIs are in fact object-oriented, using classes that can be arranged into a subclass hierarchy. GraphQL only supports a very shallow, one-level notion of inheritance: GraphQL types can implement interfaces, but interfaces cannot inherit from each other. Furthermore, even if your GraphQL types implement an interface, you still need to repeat all field declarations. While there may be good reasons for this (simple) design (if simplicity is the primary goal), we argue that there is a wasted opportunity here to use GraphQL schemas as a much more general data modeling language.

The @class directive presented here aligns GraphQL with a number of object-oriented technologies and modeling languages such as RDF and SHACL. @class can be used to annotate GraphQL types to instruct a processor that the type represents a class of objects, and that the class inherits fields from its superclasses. To make best use of this directive, you need to use a GraphQL pre-processor, or compiler, that takes the GraphQL type definitions and "flattens" them so that all fields from superclasses are repeated in the subclasses.

The following example states that Human and Droid are subclasses of Character. After pre-processing, all fields from the Character base class also apply to Human and Droid.

type Character @class {
	appearsIn: [Episode]!
	friends: [Character]
	id: ID!
	name: String!
}

type Droid @class(subClassOf: Character) {
	primaryFunction: String
}

type Human @class(subClassOf: Character) {
	height: Float
	homePlanet: String
	mass: Float
	starships: [Starship]
}

The advantages of this syntax become apparent if you want to add more subclasses, such as different types of droids with additional properties. With pure GraphQL you would need to repeat all field definitions on each level of the class "hierarchy", quickly producing an unmaintainable code base.

Note that you can define multiple superclasses by using arrays as value of subClassOf.

Now let's combine this concept with the unique identifiers from the first section. We stated that types (here: classes) can have URIs and thus be treated as data in the data graph. In the case of the data derived from the JSON delivered by the GraphQL service, we can now add the following "triples" to our data graph:

URI from JSON Object	Field	Value
`human:HAN`	`type`	`starwars:Human`
`human:LEIA`	`type`	`starwars:Human`
`human:LUKE`	`type`	`starwars:Human`

The field type is simply assumed to be present for all classes, and is used to link an instance with its type(s). For those familiar with RDF, this is exactly what the property rdf:type does.

Combined with the information about the class hierarchy, a client can now make some "inferences". For example, if it knows that human:HAN is a starwars:Human then it knows that HAN is also an instance of starwars:Character, and whatever knowledge or functionality that we have about characters also applies to humans. This includes the rich semantic constraints introduced in the following section.

Constraints on GraphQL Fields

The GraphQL schema language intentionally defines only a very focused and simple syntax to define fields. Basically each field has a type, may be null, and may be an array. But that's all. In many cases it would be beneficial to declare additional constraints on fields. For example:

latitude values must be between -90 and 90
startDate of an event must be before the endDate
the minimum length of a userName is 8 characters
the values of countryCode must be from a given list of string constants

Such constraints can be used to validate whether instance data (and JSON objects) conform to certain quality checks. Constraints can also be used to consciously constrain the user input widgets, for example to offer a drop down list for the list of available country codes. Constraints are well-known to other schema languages such as XML Schema but also the RDF-based Shapes Constraint Language SHACL.

The table below is a summary of the built-in constraint parameters for @shape directives. They intuitively map to corresponding SHACL constraints from the sh: namespace. The type Name represents GraphQL names (without quotes) or String values that can be translated into URI using the given prefix mapping.

Thankfully, GraphQL includes an extension mechanism, directives, that can be used to define additional constraints. Some tool will wants to use them, other tools can just ignore them without ill side effects. We use the directive @shape to annotate field definitions with constraints. The term "shape" was chose because it nicely illustrates that these are about the shape of the allowed data, but also to relate to the Shapes Constraint Language that defines exactly equivalent constraint types. Let's look at an example:

# A user account
type User {
	name: String!
	age: Int              @shape(minInclusive: 18)
	gender: String        @shape(in: ["male", "female"])
	purchases: [Purchase] @shape(maxCount: 100)
}

type Purchase {
	# The internal ID of the product
	productId: String!    @shape(minLength: 8, pattern: "[0-9]+")
}

Above, the age of a User must be 18 or more, its gender must be either "male" or "female", there cannot be more than 100 purchases and the productId must have at least 8 numeric characters.

The table below summarizes the supported constraint types. The hyperlinks take you to their definition in SHACL, in case you need to know the formal details.

Parameter	GraphQL Type	Description
class	`Name` or `[Name]`	All values must be instances of `$class` (or a subclass). If multiple classes are specified then the values must be instances of all of them.
datatype	`Name`	All values must be well-formed literals of the given XSD datatype, e.g. `"xsd:date"`.
minCount	`Int`	There must be at least `$minCount` values.
maxCount	`Int`	There must be at most `$maxCount` values.
minExclusive	as datatype	All values must be `>` `$minExclusive`.
minInclusive	as datatype	All values must be `>=` `$minInclusive`.
maxExclusive	as datatype	All values must be `<` `$maxExclusive`.
maxInclusive	as datatype	All values must be `<=` `$maxInclusive`.
minLength	`Int`	All values must have a string length of at least `$minLength`.
maxLength	`Int`	All values must have a string length of at most `$maxLength`.
pattern	`String`	All values must match the regular expression `$pattern`.
flags	`String`	Optional flags such as `"i"` (ignore case) for the regular expression matching using `pattern`.
equals	`Name`	The set of all values must be equal to the set of values of the field/property `$equals`.
disjoint	`Name`	The set of all values must not overlap with the set of values of the field/property `$disjoint`.
lessThan	`Name`	All values must be `<` any of the values of the field/property `$lessThan`.
lessThanOrEquals	`Name`	All values must be `<=` any of the values of the field/property `$lessThanOrEquals`.
node	`Name` or `[Name]`	All values must conform to the given shape(s), i.e. must not violate any of the constraints defined by `$node`.
hasValue	as datatype	One of the values must be `$hasValue`.
in	`[`as datatype`]`	All values must be from the given list of values.

Most of the constraint types above can also be used on user-defined scalar types. The following example defines a scalar type Latitude, which is a floating point number between -90 and 90:

scalar Latitude  @shape(datatype: "xsd:float", minInclusive:  -90, maxInclusive:  90)
scalar Longitude @shape(datatype: "xsd:float", minInclusive: -180, maxInclusive: 180)

type GeoPoint {
	lat: Latitude
	long: Longitude
}

Other typical examples of such scalar types include zip codes, social security numbers and country codes. Defining them for a scalar type means that they can be reused and do not need to be repeated at each occurrence. Furthermore, scalar types that define a datatype shape have well-defined meaning across implementations, while the GraphQL standard is very unspecific and can lead to questions about whether a scalar value should become a string or number, for example.

The following constraint arguments are supported for scalar types: datatype, minInclusive, minExclusive, maxInclusive, maxExclusive, minLength, maxLength, pattern, flags, in.

Display Metadata for GraphQL

This section introduces the @display directive that naturally extends GraphQL type definitions with information that is useful for user interfaces. Imagine you have a Customer record and would like to display it roughly as follows:

	Names:
		given name: 	Rolf-Michel
		family name: 	Massin

	Address:
		street:			9100 Oak Street
		zip code: 		91823
		country: 		USA

The basic elements of such a (form) layout are that fields can be organized into groups (such as "Address" above), and these groups as well as their fields are in a given order and have human-readable display labels. GraphQL already covers the ordering (fields are naturally sorted from top to bottom), so it is just a small step to make the schema significantly more useful:

type Customer {
	firstName: String			@display(group: NamesGroup,    label: "given name")
	lastName: String			@display(group: NamesGroup,    label: "family name")
	street: String				@display(group: AddressGroup)
	postalCode: String		@display(group: AddressGroup, label: "zip code", label_de: "Postleitzahl")
	country: String			@display(group: AddressGroup, defaultValue: "USA")
}

Field Groups

To define field groups, we are using the @groups directive at the schema. All declared groups are global (and engines can even derive URIs for them based on the name of the arguments of the @groups directive).

schema
	@groups(
		NamesGroup: {
			label: "Names"
		},
		AddressGroup: {
			label: "Address"
			label_de: "Addresse"
		}
	)
...

Each entry in the @groups directive is an object that defines one or more labels. Use label to specify the "default" label. You can use properties like label_en to define labels for specific languages, including 5-character language codes such as label_en_AU for Australian visitors of your application.

Field groups are by default ordered from top to bottom, but you can also use order to assign a numeric value for groups. That is really only ever needed if you are merging groups from multiple GraphQL schema files.

Once a group has been defined, individual fields can reference them using @display(group: ...) as shown in the example above.

Field Display Labels and Ordering

Fields can define display labels using @display(label: ...) as shown in the example above. This includes the option to define multiple, language-specific labels using label_de etc.

The natural order of fields for display purposes is the same as (from top to bottom) in the GraphQL schema. If you are relying on class inheritance and your object types are stitched together from multiple object types, then this approach does not work, and you may need to rely on the order argument to assign individual numeric values to each field. Basically, assign 0 to the upper-most field, then increase numbers. You can use floating point numbers if you want to squeeze your subclass field in between existing fields from a superclass.

Default Values

Fields can have default values, meaning that such values can be inserted into displays and forms even if a given object does not actually declare any value for the field. Use @display(defaultValue: ...) as shown in the example above.

Next Steps

The directives described here and implemented in TopQuadrant's products can be used by any GraphQL system. Some systems may for example only use the @display annotations for form building, others may also look for @shape constraints to validate input on said input forms.

The design of the directives can also be seen as part of a more consistent design in which GraphQL is playing a more central role than just sending JSON back and forth. In this design, GraphQL lays the foundation of modeling, representing and storing data for an application.

In order to make use of data delivered by an existing (possibly 3rd party) GraphQL web service, the URI templating and namespace mechanisms can be used to turn JSON objects into RDF graphs. This means that the results of multiple JSON requests can be added into a single unified graph database instead of remaining in disconnected tree structures.

To support this vision, we have developed a translation between GraphQL Schemas and the RDF world using the Shapes Constraint Language SHACL. This basically enables the use of GraphQL schemas as RDF domain models (aka ontologies). Benefits include the ability to define complex class hierarchies with a more flexible form of inheritance, the ability to extend GraphQL schemas with semantically richer constraints and inferencing rules, and cross-linkage between multiple schemas to reuse definitions, leading to a powerful enterprise knowledge graph. Details of the translation are described on the GraphQL Schemas to RDF/SHACL page.

Closing the circle, we also made it possible to automatically generate new GraphQL schemas that expose different view points on underlying RDF data, including the ability to ask almost arbitrary queries. The mechanism that turns an RDF/SHACL data model into GraphQL schemas is described in Querying RDF Graphs with GraphQL. Since the SHACL can be auto-generated from GraphQL schemas, this approach means that users can benefit from RDF graph technology even if they are not familiar with SHACL.