We conducted two polls with interesting responses and fielded several questions as part of our recent webinar, Applied Data Governance: A Day in the Life of a Reference Data Steward. The webinar focused on managing the use of reference data originating within the enterprise as well as reference datasets maintained externally by organizations responsible for industry standards.

Results of Polling Questions:

Poll 1
As you can see from the results of the first polling question (above), most organizations are still leveraging spreadsheets or custom-built solutions for reference data management (RDM). This suggests that many organizations will benefit from improved RDM practices and software to support more collaborative and centralized RDM.

chart-2
In addition, the second polling question (above) resulted in a significant majority indicating that they would prefer to manage multiple areas of data governance within one single solution. This shows that, as we demonstrated in the webinar, respondents concur that there are further benefits of managing reference data as part of a larger data governance initiative. With the continuing growth in awareness of the value of data governance generally, this is no surprise. It confirms that many organizations are looking for comprehensive data governance solutions, of which RDM would be a part.

Questions about TopBraid Enterprise Data Governance (EDG) and reference data management included:

Q1:How do you handle the case when an old code becomes obsolete and is replaced by one or more new codes? For example, when East Germany stopped being a separate country.

We showed a similar example today, although in reverse. With Czechoslovakia being split into two countries. It is a best practice is to never delete codes that are in use. TopBraid EDG-RDM will enforce such practice. Once the reference dataset’s status changes to “in use’, you will not be able to delete a code. Instead, the status of a code is changed. This is done because there is historical data that used the code and it is unrealistic to expect that all this data will be modified. Thus, we need to know that the code existed and what it meant when it was in use.
In cases where organization wants to migrate some of historical data to using new codes, TopBraid EDG-RDM can help facilitate this process by capturing the relationships between old and new codes.

Q2:Why are standards and software APIs important in reference data software?

That is really a two part question. Regarding standards…being W3C Semantic Web Standards based is important to our customers because they know their information isn’t in some proprietary or binary format that we made up. We use open published standards. The same thing brings benefits to us because it makes it easier to interface our products and data with other products and data, so it makes integration a lot simpler.

Integration leads into the second part of the question: software APIs. It is simple for other systems to retrieve data from TopBraid EDG with our RESTful API and defining a new web service can be as easy as creating a saved search. Also, in 6.0 we’ve embraced GraphQL which is rapidly growing in popularity for those that need to work with JSON output.

The TopBraid platform has tooling that let you easily create scripts that can make web service calls to other systems, so setting up dynamic two-way integration by using web services is never very difficult.

Q3:What are the differences between and reasons for managing internal, external and what are called “enterprise standard” reference datasets?

Most organizations use both external and internal reference data. Often, it is called public and private. They’ll use external (public) reference data in situations where a suitable “industry standard” dataset exists. They use internal (private) reference data for codes that are unique to an organization such as product categories or location codes and sometimes for legacy codes where suitable standards exist, but an organization has already created their own alternatives.
Increasingly, organizations prefer to use standards as their reference data. For example, country and currency codes from ISO, occupation codes from the Bureau of Labor and Statistics, industry codes from NAICS, and security identifiers from Bloomberg. Using external reference data instead of creating your own codes has advantages, such as:

  • You can rely on the third party to create and maintain data.
  • It may often be more complete and future-proof than what you would have created since organizations that maintain such datasets have to consider a broad range of requirements.
  • Integration can become easier as other parties may also be using the same standards.
  • In some cases, you may be required to use the standard version – for example, for certain government reporting.

It is also common for organizations to extend the external datasets with internal organization-specific codes. Some standards even have conventions (such as an allocated group of codes) designed specifically for extensions. Even when there is no applicable industry standard, it is a good practice for an organization to establish its own standard.

Organizations will have existing systems that use their own values for reference data entities. This doesn’t mean that all applications need to immediately move to using a standard. It may not be practically possible. Even if they start using new codes, old data will still be using the old codes. This is where crosswalks come into the picture.

As you saw in the demo, TopBraid EDG-RDM supports all variants and versions of internal (private) and external (public) reference datasets, as well as versioning, crosswalks, and comparisons between reference datasets as needed.

Q4: What if we have multiple reference datasources?

We only demoed the import of a single set of reference data however multiple sets are typical and TopBraid EDG can handle as many as desired. If these reference datasets describe identical objects – like countries that were part of the demo – each of these can be related and aligned using additional crosswalks.

Q5: For API/Web service using reference data, can we customize the web service to flag issues/problems if “client data” don’t match with reference data?

TopBraid EDG provides validation services for client datasets to be validated against. These services can trigger other services or be customized to do other things you need them to do.

Q6: Is there a trial version for the RDM capabilities with EDG?

Yes, we offer an evaluation server for TopBraid EDG-RDM. For evaluation purposes it is a private virtual machine hosted on our infrastructure making it very easy for you to get started. We typically set up 5 user accounts. You can also evaluate the capabilities in EDG for other types of assets such as taxonomies or glossaries along with evaluating the Reference Data Management package. For onsite-installed evaluations we offer an affordable Jumpstart package. For a modest fee, you can get a 3 or 6 month term license plus a few days of consulting and training. We also work with interested organizations to define a custom Jumpstart package with details specifically suited to their goals and needs.