Sometimes when people think of reference data, they presume that all one needs to capture is a code and its name—a human readable label, and possibly, some standard metadata such as its status, effective date, source and person responsible.

In reality, however, every codeset has its own unique collection of information fields that are important to capture and manage. For example:

  • As we discussed in a previous blog in this series, codes have connections to other codes. Currencies, in particular, have relationships with countries.
  • With currencies, it can be important to store a symbol used to display a currency such as $ or £.
  • Some symbols are displayed before the numeric value and others after, so this is another distinct piece of information to include with the codes.
  • Currencies are associated with financial data. Organizations may want to keep information on how to round the amounts when making different calculations. Dollars may be rounded differently than Chilean pesos, for example.

As we can see, each reference dataset is unique. Different reference data needs uniquely different information fields or properties. At any point, requirement for new fields may be identified and workers responsible for managing reference data must be able to easily modify its schema.

When a reference data management solution is not flexible enough to allow its users to add new fields on demand, users often end up storing all sorts of important information in some default field available to them such as a Comment or a Description field. They may also co-opt fields intended for other purposes, semantically “overloading” them. While this may work for human consumption, it doesn’t work at all for providing relevant information to other systems. People can carefully read what is in the field and figure out the meaning. Systems rely on structure to query for explicitly identified information.

We can see this problem with the Market Identifiers reference data shown below, where a Comment field is used to store a variety of different information including whether a market is a dark pool. This is an example of an external or public reference dataset provided by a standards organization (ISO 10383 – Market Identifier Codes) where certain information that may be needed by business applications or analysts is given only in comments –not an uncommon practice with either external or internal (private) reference data.

Dark pools are private exchanges or forums for trading securities. They facilitate block trading by institutional investors and, unlike stock exchanges, are not accessible by the investing public. It is often important for a consuming application to know whether an exchange is a dark pool. With TopBraid Reference Data Manager (RDM), a data steward can simply add a new field for any reference dataset without having to make a system change request and asking for development or system administration resources to implement it. The screenshot below shows the addition of such a new field (column), which enables systems and people to query whether a given market is a dark pool, or generate reports that list all the dark pools (or markets that are not.)

By putting direct editorial control in the hands of the authorized users, TopBraid RDM improves the usability and usefulness of reference data.

With all the different information associated with reference data, data quality becomes important. We may want to control not only the code (for example, that it must be four characters), but create rules that will ensure correctness of other fields as well. We will discuss this topic in more detail in the next installment of this series.

Other blogs in this series:

Part 1: Addressing change in reference data
Part 2: Capturing and maintaining relationships between reference data
Part 3: Supporting external and internal reference data