Don’t Duplicate, Consolidate
Enterprise data integration breeds a system that avoids data mart multiplicity.
Organizations buried by replicated data, inconsistent information and the inability to respond to new requests in a timely manner are typically companies that have decided not to integrate their corporate data. Faced with increasingly constrained budgets, these organizations create what they think are cost-saving data marts.
Now, forward-thinking business leaders are recognizing that a consolidated enterprise data warehouse (EDW) saves a company time, money and customers.
Enterprise data strategy
A corporate enterprise data strategy recognizes that data is a valuable resource. The key is that IT must adopt two important rules, summed up as
- Store once, use many.
- Any question, any data, any time.
The first rule establishes that an extract, transform and load (ETL) process is used to consistently and accurately move the data only once into a high-performance repository. Data is then used, without replication, in many places by any application or tool. Accomplishing this requires a normalized physical schema.
The consistency and accuracy valued in this strategy demands a logical data model (LDM) to define data and its usage based on business rules and user requirements. The operative word, “logical,” means the data model is independent of any physical schema or database technology. The LDM becomes the strategic blueprint for planning and building the integrated enterprise data repository. Applying the LDM to the physical schema eliminates redundant operational jobs and results in a streamlined ETL process. In my experience, up to 70% of ETL activity involves denormalization, rather than transformation, to fit into the specific database schemas for performance.
The second rule establishes that data is a corporate resource and that access to timely and accurate information with user-friendly tools is essential for front-line and back-end employees. Users must be able to ask any question on any data at any time, without IT professionals having to intercept, code and modify the queries. A solution that uses logical views to map data to business usage and applications is preferable.
Logical views offer the best of both worlds: The physical schema allows for rapid data loading without redundancy, and the logical views give the users “virtual” denormalized access to data in familiar structures. The benefit is presenting data via common, recognizable interfaces because activities such as defining the join criteria in the views are hidden from the users and tools.
To commit to these rules that lead to available, accurate and useful information, the company’s data from across the enterprise must be integrated into a universal repository. Unfortunately, organizations rarely approve the budget for a stand-alone enterprise integration project. However, several methods exist that can help organizations achieve their ultimate goals without acquiring specific funding for the project.
Where to start
The initial step in the data integration process is to self-educate. Industry analysts can provide references to companies that have successfully employed an integrated EDW. The next step is to obtain and implement an enterprise LDM so that once it is in place all projects proceed within its framework.
Data integration specifies that each functional area (i.e., data marts) must be built in the same physical database and within the same schema. All overlapping data is stored only once and modeled after the LDM business rules. Strategically, this will take more time and effort up front to create the model and integrate the data, but the resulting benefits—accurate data, confident users, etc.—make up for it.
Click to enlarge
Using the LDM as the strategic map, IT navigates the tactical steps of consolidating its data marts to accomplish the long-term goal of data integration.
Data mart consolidation
Most organizations embark on a data integration strategy because of the tremendous cost savings it offers. Specifically, consolidating data marts can reduce disk, server, operational requirements and people costs.
Professional services studies show that the money saved by consolidating as few as four or five data marts will more than cover the project’s expense. While these savings typically justify the cost of the project, the potential business benefits can add untold value. The major advantage is that integrated data enables users to answer questions that siloed data marts cannot. It allows the second rule—any question, any data, any time—to become reality. Furthermore, the greater the number of consolidated data marts, the greater the savings and potential business impact, which circles back to the first rule—store once, use many.
Most organizations embark on a data integration strategy because of the tremendous cost savings it offers.
Generally, three approaches can be used for a data mart consolidation. (See figure.) In each of these, as the data mart is folded into the EDW, the LDM is employed as the blueprint for the integration:
- Redesign. All data marts are mapped into the LDM, all tables are normalized and any shared data is stored once according to the organization’s business rules. To take advantage of this streamlined approach, the ETL process is restructured so that the data is extracted once, transformed and loaded into the physical schema. Data redundancy, which can occur with multiple data marts, is eliminated, thus resulting in consistent data and significant cost savings.
Although it can yield dramatic results, the redesign method is seldom used because it takes the longest to produce visible results.
Forklift. Tools are used to export table definitions and data from the data marts to a common platform. Re-hosting the marts onto a faster, parallel processing database platform can yield significant performance gains.
Too often, however, IT becomes satisfied with the performance gains from the new platform and is not motivated to continue the data integration steps. Incomplete data integration means cost savings are not realized and the business users do not receive additional capabilities. While it is the fastest method to show results, the integration process must be completed for users to fully benefit.
Forklift and redesign. The data marts are forklifted onto the new platform, and logical views are built to reflect the business usage of the data. These views allow separation of the logical schema from the physical schema and are designed to match how the applications and tools are accessing the data. Then the physical schema is normalized and integrated.
This hybrid method offers the best approach. Because all shared data is stored in common tables, data redundancy is eliminated. Furthermore, rapid data loading is enabled, the ETL process is streamlined and operations are simplified.
Up to 70% of ETL activity involves denormalization, rather than transformation, to fit into the specific database schemas for performance.
As an alternative method to these three approaches, if the organization does not approve a data mart consolidation effort, the company’s strategic goals can be obtained by having all new projects implemented within the enterprise framework. The first application, which is built based on the LDM, is used by each additional application to integrate the ETL and physical data into the EDW.
Ready for the future
A comprehensive enterprise data integration strategy is the solution to IT departments that are buried by replicated data, plagued with inconsistent information and unable to respond in a timely manner to new requests. Implementing a structure that eliminates duplication and ensures consistency enables the IT community to not only address users’ current requests but also satisfy their future data requirements.