Working with Data Lineage

Licensing and Enablement

The availability of any asset collection is determined by what is (a) licensed and (b) configured under Server Administration. To install a license or to view the currently licensed features, see Administrator Guide section on Product Registration` . To configure which licensed collection types are currently enabled or disabled, see Administrator Guide section on EDG Configuration Parameters.

For general licensing information and available asset collections and packages, see the TopQuadrant website.

TopQuadrant Data Governance Packages

Lineage for EDG

Introduction to Lineage

Business needs that motivate the interest in Lineage range over policy and regulatory compliance, operational efficiencies and analytics.

The scope of the data lineage information is usually determined by data governance – based on:

  • Regulatory compliance needs

  • Enterprise data management strategy

  • Data impact and reporting needs

  • Critical data elements of the organization

The scope then determines the nature and volume of metadata required to represent lineage.

Data lineage includes the data’s origins, what happens to it and where it moves over time. Enriched data lineage information may include:

  1. data models

  2. business glossary terms

  3. enterprise information systems

  4. software executables that use data assets

  5. software functions that perform transformations

  6. provenance of data assets

  7. data usage

  8. logical and physical flows

  9. parties, roles and stakeholder groups

A number of visualizations of lineage and impact are possible ranging over different asset types from the enterprise level to the data asset level. Lineage diagrams are generated by invoking the “Show Lineage” menu action on the menu called “Explore” Availability of this menu option is dependent on the asset type. It will be active for the following asset types:

Data Asset

A Data Asset is any data item that is of value to the enterprise. It can be a database, a dataset or a data element.

Information Asset

An Information Asset is an artefact that is managed by an organizational entity. These include documents, forms, schedules, and any other work-product that is constructed from data assets and used both within and across organizational boundaries.

Provenance Type

Provenance Type is a category of asset types whose existence was derived from, or was influenced by, some other entity, or were originated by some party.

Technical Asset

A Technical Asset can be a software asset or a hardware asset. Software assets further partition into a number of sub-classes such as applications, systems, software modules and software executables.

Lineage Model

A Lineage Model establishes a context for determining how enterprise capabilities, business functions and information assets are dependent on data flows, applications, software executables and data transformations across data sources and sinks.

With this categorization of asset types that support lineage, further details are provided for each type. For most types the description will be from the context of a lineage model.

Application Lineage

Applications can depend on other applications in a number of ways:

  1. A direct dependency can be expressed using the property edg:dependsOnDataFrom.

  2. An inferred dependency from the need for data from programs and functions that an application uses. The properties involved are edg:usesSoftwareProgram, edg:usesSoftwareFunction, edg:input and edg:output.

  3. An inferred dependency based on information assets that are required as specified by edg:requiresInformationAsset and edg:producesInformationAsset.

  4. Application can also have service endpoints that specify source and target nodes.

  5. Another kind of dependency is a Logical Flow, whose purpose is to depict what is happening (from a business perspective) as opposed to how it is happening (from a technical perspective). This will be described in more detail in a later section.

The image below shows an example of a lineage model for FRY-9C lineage. On the far right is the activity called FRY9C SECURITIZATION. Immediately to the left is the lineage model that supports the activity. Moving further to the left (upstream) is the application Securitizatiion Application. Clicking on the link will reveal that this dependency is based on an edg:usesSoftwareExecutable property. Continuing upstream there are dependencies based on applications and databases. Orange links depict inferred dependencies based on software program inputs and outputs.

TopBraid EDG Example of a Lineage Model for FRY-9C

TopBraid EDG Example of a Lineage Model for FRY-9C

Clicking on the link between TOPBANKCORP and AFS Securitizatiion Application shows a detailed view of the dependencies.

TopBraid EDG Detailed View of the Dependencies

TopBraid EDG Detailed View of the Dependencies

The derivation map is shown enlarged below:

TopBraid EDG Derivation Map

TopBraid EDG Derivation Map

Clicking on the link between AFS Securitizatiion Application and Securitizatiion Application shows a detailed view of an example of program input/output dependencies.

TopBraid EDG Program Input/Output Dependencies

TopBraid EDG Program Input/Output Dependencies

Business Activity Lineage

A Business Activity is work performed in an enterprise in support of a Business Function. Business functions may involve a number of business activities. A business activity may also be needed to support another business activity. Activities can be ordered using “next” as a linking property. A Business Activity can require a process for a more detailed specification of how it is performed.

Business Functions are the activities carried out by an enterprise; they can be divided into core functions and support functions. Core business functions are activities that either (a) generate revenue such as the production of final goods or services intended for the market or for third parties; (b) provide value to external parties in accordance with the reason-for-being for the enterprise. Support business functions are ancillary (supporting) activities carried out by the enterprise in order to permit or to facilitate the core business functions. A business function can be thought of as a “Functional Area”, as such it is also an alternative term for a business unit.

An example of the lineage of a Business Activity is shown below.

TopBraid EDG Business Activity Lineage Example

TopBraid EDG Business Activity Lineage Example

Business Area Lineage

A Business Area is a particular operational organizational unit such as Finance, Materials Management, and Customer Service. Granularity of the business areas can vary – business areas can have sub-areas. A business area can also be referred to as a “Line of Business”.

The image below shows an example of a lineage model for a business area for Product Design for Computer-Aided-Design (CAD) in the engineering domain. This shows dependencies on activities and their first level of applications.

TopBraid EDG Business Area Lineage Example

TopBraid EDG Business Area Lineage Example

Information Assets

An “Information Asset” is an artefact that is managed by an organizational entity. These include documents, forms, schedules, and any other work-product that is constructed from data assets and used both within and across organizational boundaries.

The image below shows an example of a lineage model that is using applications that import form items.

TopBraid EDG Example of Lineage Model

TopBraid EDG Example of Lineage Model

Derivation Maps

A Derivation Map is a diagram that shows the details for a specific link in a LineageGram. LineageGram is available as a plugin for Customers who have purchased Lineage Models as part of their packaging. Please contact TopQuadrant support for this install. A number of derivation maps where shown in the section for Lineage Models. This section describes more details about derivation maps.

The image below shows how a derivation map can include aspects about a Data Asset. What is shown are policy compliance and protected data aspects.

TopBraid EDG Derivation Maps Example

TopBraid EDG Derivation Maps Example

An example Lineage Model can be built in a local copy of TopBraid EDG ME (Maestro Edition), that is, running in TopBraid Composer, by following these steps:

  1. Create a new Lineage Model Asset Collection;

  2. Select Import RDF;

  3. Choose the file lineage-model_topbankcorp-fry9c.ttl from the folder in your workspace at edg.topbraidlive.org/1.0/samples/lineagemodels and complete the import;

  4. The assets table should appear with the focus on assetts of type Lineage Model;

  5. Go to the single Lineage Model called FRY9C-SECURITIZATION;

  6. From the Explore menu choose Show Lneage.