In this blog post, we explore how a major multi-national life sciences/agricultural/pharma company used TopBraid EDG as a centralized, shared repository of all reference data, ontologies, taxonomies, and business glossaries.

Background

A major multi-national life sciences/agricultural/pharma company was lacking consistent reference data. The company had been managing hundreds of data standards through the use of spreadsheets and manual processes.

 

 

Challenge

Each of the many applications and users referred to commonly shared data (such as a location) in its own way. This was inefficient, created data interoperability problems, made it hard and expensive to aggregate data for reporting and was a cause of erroneous data use. Different downstream systems required different representations. A system for representing shared data needed to be sufficiently flexible to support these representations and to support changes to the data, including changes “on the fly” from downstream systems.

 

Solution

TopBraid EDG was chosen to support users and other software applications in their need to integrate and discover information. The data was imported to EDG from existing sources using “out of the box” import capabilities in EDG (see Figure 1).  The company used EDG crosswalks to create mappings between the different datasets.

To ensure proper governance and standards compliance, company used TopBraid EDG governance and collaboration capabilities, establishing user roles, rights and workflows. In addition to formal workflows, they also utilized more ad hoc data curation task assignments and the ability for all stakeholders to submit comments on the data.

Dashboards and reports present the status of the tasks and the completeness of the data making it easy for users to identify any issues with processes and data.

Ingest, auto-match and enrich processes can be invoked by users on demand or executed as scheduled services.

Users could then request TopBraid EDG to enrich the original datasets by adding or replacing data as deemed necessary. The normalized datasets are available for retrieval by users or for provisioning to downstream systems.

 

Figure 1: Import Spreadsheet using Pattern Selected

Figure 2: Taxonomy of Geographic Regions

Results

TopBraid EDG is being used to hold a growing number of company’s taxonomies, glossaries and references datasets (see Figure 2).  Reference datasets alone number over one thousand and are used by applications and users across the enterprise.

TopBraid EDG provides tools to search the data, manage and curate the data, answer ad-hoc queries from data stakeholders and business applications and to generate reports necessary for decision-making and data alignment.