Tech2Tech
Applied Solutions 1
The enterprise alchemist
Understanding metadata’s role and value.
by Taher F. Borsadwala
Literally defined, metadata is data about data. It provides details about the data, such as where it came from, who used it, and what resource and target it is tied to. But organizations that don’t understand the value of metadata wonder why it’s necessary: Enterprises have extensive data stores already, so why would anyone want to add to that bulk by creating a metadata repository? What benefits does metadata provide organizations?
In his book “Building and Managing the Meta Data Repository,” David Marco likens enterprise data to a library and metadata to the catalog. To further that analogy, think about all of the information a library holds in various media: print, audio, video and so on. This information can be organized by author, description, genre, etc.
Without a catalog to hold that information and present it in a usable format, locating the simplest book or video becomes more difficult than finding a needle in a haystack. And the benefits these sources provide could be lost. The same holds true for an enterprise’s data. If the data isn’t organized properly, it can be hard to find and use.
For example, asset management or knowledge marts have caught the fancy of many organizations, but often these marts are underutilized simply because employees aren’t aware of them, they are unable to access them, or the marts are hidden in different intranet sites. Thus, much of a company’s in-house knowledge base goes to waste. The absence of a single touchpoint, or data catalog, poses the greatest obstacle to thorough utilization of a company’s resources.

Click to enlarge
In the attic
Most companies understand the importance of data as an asset, and they work to create and maintain data warehouses to support it. But organizations grow, and mergers and acquisitions happen. Suddenly data silos are created, data sources change and, consequently, data duplication and redundancy creep in. There becomes an overabundance of unused and underappreciated data.
Eventually, an unkempt data warehouse can resemble an attic, where people simply dump unused items. Consequently and all too often, items (i.e., data) are stored without any understanding of what they are, their purpose, or why they’re worth keeping. And worst of all, in the end users can’t find what they need when they need it.
An enterprise data warehouse (EDW) presents an ‘n’ factor case of an attic. In other words, the amount of lost, unused or underappreciated data can be limitless. Without a good pointer system, businesses don’t know what they have or how to use it. This is where metadata comes in. It’s the source of information—the catalog—for all of an enterprise’s data, and it is necessary for successfully managing this data.
A key role
Metadata and its utilization are integral parts of enterprise data management. It provides valuable information to users, answering critical questions such as:
- What data assets do I have?
- Where did the data come from?
- What does it mean?
- Is the data quality reliable?
- How fresh is it?
- Is the information timely?
- What is the authoritative source?
- How was this calculated?
- How is the data secured?
- Who owns it?
- Is it in compliance with regulations?
Additionally, metadata is the official record keeper of an enterprise’s tangible and intangible data. (See figure.) Tangible data is composed of various source systems, data stores (databases) and applications, as well as tools used for modeling, queries, business intelligence (BI), and extract, transform and load (ETL) functions. Intangible data is the business know-how and subject matter expertise that resides with employees. Typically, this human knowledge isn’t physically stored anywhere, aside from when it is translated to application spreadsheets, documents or presentations.
Data life cycle
Tangible and intangible data coincide as well as interact. Consider how data that resides in multiple source systems is handled both by the data management program and by human action. For example, tangible data such as Customer Name can have various formats or structures, each of which is stored in different systems. One system stores it as Last Name, First Name while another stores it as First Name, Initial, Last Name, and so on.
Upon extraction from the sources, this data is dumped into a staging area that mimics the various structures. The BI tools transform these structures accordingly to ensure the format is consistent, and the result (i.e., Last Name, First Name) is stored in a different staging area that maps the final data warehouse structure. Once this data is loaded into the EDW, tools are used to standardize, cleanse and match the existing and incoming data. This leads to the creation of facts, dimensions, master data and cross-references.
Dirty tangible data (e.g., misspellings of the customer’s name) is forwarded to subject matter experts (SMEs) or data stewards who use their business sense—intangible data—to validate or correct it. The resulting clean data is stored in the EDW and provided to various data marts and applications, where it is leveraged by query or reporting tools. End users receive the resulting tangible data—some of which has been corrected by intangible means—typically in the form of spreadsheets.
During its life cycle, the data is stored, touched and used by numerous systems, applications and users. These could include source systems, staging areas, BI tools, SMEs, data stewards, a data warehouse, master data management tools, data marts, various applications, reporting tools, spreadsheets and end users.
The fact that one piece of data can have so many touchpoints makes metadata invaluable. It will identify for users where the data started and where it ended—an end-to-end picture, also known as lineage. From this, users can deduce how the data was used or interpreted throughout its life cycle.
Metadata has its own categories within the repository. These different types include:
- Business. Provides direct information that allows end users to locate and understand the business context of data
- Semantic. Offers a business view of information in the data warehouse for knowledge workers
- Design. Captures data about the conceptual, logical and physical design of the data warehouse
- Technical. Facilitates the acquisition of data from source systems into a form appropriate for analytic purposes
- Lineage. Describes when, from where and how data was moved into the data warehouse and what has happened to it since
User groups
Such a variety of metadata facilitates a wide user community. In general, every enterprise has three types of high-level users: business, technical and a mix of both, namely the BusTech group. Each group has its own purpose for needing the data and often has its own language. For example, consider how these groups might approach two monthly sales reports. The first report measures sales using geography as a dimension, while the second uses date/time as a dimension.
Business users might refer to these reports as “Sales for the Month of March by Geography” and “Sales for the Month of March by Date/Time.” Technical users, however, would have just one template for the report with changing or dynamic dimensions. Their template name could be “SALES_TMPLT” and the dimension factor could be a dynamic input, say “GEO” and “DT_TM.” Finally, the BusTech users are typically comfortable with either of the reports or naming conventions, or they would create their own.
In essence, all three groups are referring to the same two reports but are calling them something different based on the particular functions of the reports and how the groups will use them. Technical users need to know intricacies such as the source, transformations and so on. Business workers might also need to know such details, but they would have a different set of naming conventions, more attuned to the business world. And the BusTech group might need a mixture of this information, thus creating its own language. Even with all of these name variations, the data is still the same. Metadata catalogs these names in the repository so in the end anyone searching for the data’s history will have the same results.
Metadata management
To maintain metadata’s value and usefulness in organizations, managing it must be an ongoing, comprehensive process. This can be challenging, given the various management systems and tools that are used. Though specialized tools perform well in their own domains, integration of these tools can be difficult.
Object Management Group (OMG), along with a handful of other companies, saw the potential for standardizing the format for data warehouse storage some time ago. The path OMG took led to various standards for metadata integration:
- Common Warehouse Metamodel interfaces can be used to enable easy interchange of data warehouse and BI metadata among data warehouse tools, platforms and metadata repositories in distributed heterogeneous environments.
- Meta Object Facility (MOF) provides an extensible model-driven integration framework for defining, manipulating and integrating metadata and data in a platform-independent manner.
- XML Metadata Interchange (XMI) is used for integrating tools, repositories, applications and data warehouses. XMI provides rules by which a schema can be generated for any valid XMI-transmissible MOF-based metamodel.
Because of growing metadata awareness and use, these three standards—and others—allow for seamless, thorough and easy integration.
Metadata can be integrated and managed in a Teradata environment using Teradata Meta Data Services, a comprehensive and customizable solution. It enables users to load, manage, consolidate, locate and navigate data warehouse metadata and its associated business metadata. And it is the only metadata management solution optimized for and integrated with the Teradata platform.
Realize value
In reality, metadata is much more than just data about data. It is an alchemy that converts enterprise-wide data into knowledge, giving it an identity and meaning. No longer is the data dumped in the attic and forgotten. Creating a metadata repository inside the data warehouse provides users the opportunity to know what data they have, where it came from and how it can help the organization succeed.
Taher F. Borsadwala, a Teradata Certified Master and a certified PMP, specializes in Master and Meta Data Management methodologies. He works for the Teradata Enterprise Data Management Center of Expertise.