In part 3 of this blog series we’ll look at two different types of reference data – external and internal, and how the ability to capture metadata about reference data is critical to the understanding and use of reference data.

Increasingly, organizations prefer to use standards as their reference data. For example, country and currency codes from ISO, occupation codes from the Bureau of Labor and Statistics, industry codes from NAICS, and security identifiers from Bloomberg. Using external reference data instead of creating your own codes has advantages, such as:

  • You can rely on the third party to create and maintain data.
  • It may often be more complete and future-proof than what you would have created since organizations that maintain such datasets have to consider a broad range of requirements.
  • Integration can become easier as other parties may also be using the same standards.
  • In some cases, you may be required to use the standard version – for example, for certain government reporting.

Most organizations use both external and internal reference data. They’ll use external reference data in situations where a suitable “standard” dataset exists. They use internal reference data for codes that are unique to an organization such as product categories or location codes and sometimes for legacy codes where suitable standards exist, but an organization has already created their own alternatives.

As we discussed in part 2 of this series, reference data is connected, and it is common for organizations to connect external and internal reference datasets and extend the external datasets with internal organization-specific codes. Some standards even have conventions (such as an allocated group of codes) designed specifically for extensions.

While external and internal datasets are very similar from the use perspective, from the management and governance perspective they require different processes. Metadata is critical to preserving the meaning and usefulness of data. It is also critical to supporting data governance processes. Some of the metadata that organizations need to capture about a reference dataset is the same for external and internal reference data. This includes information such as:

  • Dataset name
  • Description
  • Status
  • Responsible parties
  • Where it is used
  • The meaning of each of the dataset fields

External reference datasets also require additional metadata such as:

  • What standard it represents
  • Who is the maintenance agency
  • What type of license or subscription an organization may need to use it
  • How often it is updated
  • What format updates are delivered in
  • What are the procedures an organization put in place to onboard these updates

While we have seen organizations that maintain metadata about the reference data separately from the reference data itself, such approaches are rarely effective. For example, separating metadata from data by creating Excel spreadsheets or Access databases with this information doesn’t work well. Invariably, this results in the metadata not being available to key stakeholders or users, multiple conflicting versions of metadata being used by different groups, and a general disconnect between the state of the data and the state of its metadata.

In selecting a Reference Data Management solution, it is important to consider if it supports capture of the metadata and its use to facilitate reference data governance. TopBraid RDM comes with a predefined set of most useful metadata for both internal and external reference datasets. Each organization may also have some unique requirements for additional metadata. TopBraid RDM is fully model driven. Designed for extensibility, it makes the adding of new metadata quick and easy. This is true not only for the metadata about reference datasets, but also for the fields within a dataset.

We’ve been recently asked (see this recent post for more background):

“Could one create a pre-specified fixed set of fields and metadata for each type of reference dataset?”

The answer is “yes,” except for the “fixed” part. Some fields can be anticipated, but each reference dataset is unique and each organization has unique requirements. It may be helpful to have some standard templates. However, they will always need to be extended. Putting the flexibility and power in the hands of data stewards and other business users of RDM solutions to create such extensions themselves is key to the success of reference data management and governance.

Extensibility and flexibility come up in most discussions about reference data management. Part 4 of this blog series will focus on the importance of providing the business users of a reference data solution the flexibility to easily add information fields to a reference dataset or to the metadata about the dataset.

Other blogs in this series:

Part 1: Addressing change in reference data
Part 2: Capturing and maintaining relationships between reference data
Part 4: Enriching reference data with new information fields