In the second half of 2015, TopQuadrant participated in many conversations about Reference Data Governance. In this blog, we have compiled some top questions from these conversations including those arising from a recent webinar and survey conducted with Aaron Zornes of the MDM Institute, TopQuadrant’s sponsorship and exhibition at several data governance conferences in 2015, direct inquiries to TopQuadrant regarding TopBraid Reference Data Manager and the recent MDM Institute Field Report issued on TopBraid RDM.
(Other than indicated, responses to the questions below are from TopQuadrant).
Questions about Reference Data Governance
Q1: Why is RDM a first logical step in your MDM strategy?
[Response by Aaron Zornes] Initial RDM investments are manageable— getting started won’t kill your budget like CRM, ERP, CDI or PIM or full bore multi-domain MDM for even a single domain … price point is closer to $250-$500K for the software and it does not require 3-4X that amount in services like full MDM does … Clearly, RDM is often a good entry-level project to show success for initial MDM investment which can be built on as a Data Governance model. However, it is important to note that the functionality required for an RDM solution is not aligned with every single MDM use case.
Q2: Do you see RDM as connected to MDM or can it work separately?
[Response by Aaron Zornes] RDM is sufficiently independent from traditional MDM (CDI, PIM) that 3rd party RDM can be a great first step toward your MDM strategy … it’s an easy chunk to bite off compared to mainstream MDM … While RDM aligns quite nicely with a majority of MDM use cases, the one notable exception is CDI projects (generally volumes of multi-tens of millions) that have a strong focus on profiling and quality
Q3: Where do you draw the line for reference data that should be in the central RDM and reference data that should remain in the enterprise systems?
Enterprise systems need access to reference data in order to deliver their business functionality. For example, codes are often used in the auto-completes and drop-downs on edit screens. A system could query RDM in real time to populate these, but in many cases, it may be prohibitive in terms of performance and reliability, so systems typically need to store reference data locally. One should look at this stored data as a “local cache” for an application’s use. An RDM system should be seen as the master for reference data. Part of the data governance process that must be supported by the RDM functionality is to establish regular distribution of reference data to systems that cache it. This brings a number of requirements for RDM. For example, it must capture information on what enterprise systems use each dataset and how they are getting updates. It also needs to be able to verify the current state of this cache against the master version in RDM and report on any differences.
Q4: Part of the RDM functionality overlaps with data quality solutions (e.g. data quality rules), so how you see solving silos of data quality and governance?
RDM must support data quality within reference data itself. For example, an organization can have rules about its product codes – how many characters, structure, etc. Data quality rules that maintain the consistency and quality of reference data belong in the RDM system. Then, there are issues of data quality in transactional data that uses reference data to describe it. Not having RDM in place will pretty much guarantee that reference data will be used incorrectly and inconsistently across different applications. In our survey, the need to address errors in transactional data was identified as a key driver for RDM adoption. Organizations that implement RDM see significant improvements in the overall data quality.
Dedicated data quality solutions may be used to identify issues in transactional data. They can help to clean your existing data, but they are not dealing with systemic issues. They are trying to fix issues after the fact. This is always more expensive and less reliable. Also, to do their job, data quality solutions need the kind of knowledge that RDM must have – which codes are active and which ones should not be used anymore, when and where they should be used, etc. This means that if you put RDM in place, your data cleansing will become more effective. And, overtime, you will not need to rely on the data cleansing as much because RDM will help prevent data issues before they happen.
Q5: Why would I buy a standalone or purpose built RDM solution when my MDM solution offers an RDM component?
[Response Aaron Zornes] RDM is sufficiently independent from traditional MDM (CDI, PIM) that 3rd party RDM as adjunct is valid strategy … mega vendors’ RDM pricing models tend to frustrate rather than encourage wide-spread deployment of RDM capabilities
Questions about TopBraid Reference Data Manager
Q6: How does TQ RDM differentiate from other products?
There are certain features that you’re going to find in any RDM solution, such as the ability to load up code lists and other reference data, to edit that data, and to make it available in some format to the applications that use that data. The key things that separate out TopBraid RDM from this pack are, first, its ability to store metadata at so many levels: metadata about the data sets, about the individual codes, about the systems using the data, and more. Secondly is the way that business users such as data stewards can add and change data fields associated with your reference data themselves (if they’ve been granted the right permission) so that they don’t have to make requests to IT staff and wait however many days for these change to be made. This lets them be much more agile in how they respond to changes in the data they work with.
RDM also lets these same data stewards connect datasets, so that for example a country name on a currency code list can be a hypertext link to that country’s entry on a country code list, where you can see additional data about that country that can provide context for the currency code.
Another feature that is special in TopBraid RDM is the ability for business users to define data quality rules, often by just filling out a form. They might specify that a numeric value has to fall into a certain range or that a particular code has to be a certain combination of letters and digits. If someone breaks one of these rules when entering data, they’re alerted immediately, and you can also run a report that scans an entire dataset for violated rules.
One more special feature I want to mention is the ability to easily define new web services that other systems can call when they need up-to-date reference data. This can be as simple as creating a saved search, which saves a URL to use in a web service call when you save that search, and more sophisticated options are available. None of these additional options requre anything like Java coding and compiling, though, and they make it very straightforward to integrate TopBraid RDM with your other systems.
Q7: Can TQ RDM be used for other industries other than Financial Services?
The world of financial services has many good examples of reference data, but whether you’re in manufacturing, or media, or health services or pretty much any business where large organizations have multiple levels of data interacting with each other, good reference data management really makes your operations more efficient, and TopBraid RDM can work with all of them.
Q8: Why are standards and software APIs important in reference data software?
Standards are important to our customers because they find it reassuring that the data that they save with our products is not in some binary format that our engineers made up, but uses open published standards. The same thing brings benefits to us because it makes it easier for to interface our products and data with other products and data, so it makes integration a lot simpler. Another thing that makes this integration simpler is how easy it is for other systems to retrieve data from TopBraid RDM with RESTful web services–I mentioned earlier that defining a new web service can be as easy as creating a saved search–. The TopBraid platform underneath TopBraid RDM has tools that let you easily create scripts that can make web service calls to other systems, so setting up dynamic two-way integration by using web services is never very difficult.