Columns
Why Teradata
Extreme breakthrough
Meet the challenge of storing and analyzing massive amounts of data.
by James Dietz
Each click of a customer’s mouse, each cell phone call, each sensor measurement on a manufacturing line generates enormous data sets. While companies are awakening to the great opportunity for new business insight from this data, they also see the huge challenges in squeezing the costs out of analyzing these extreme volumes of data.
Taken cumulatively over days, weeks and years, storing and analyzing the volumes of data created by these vital business functions can require a data mart or data warehouse of thousands of terabytes in capacity. Until recently, the cost of undertaking such a colossal effort overshadowed the business value and potential benefits to the bottom line.
For example, a large e-commerce Web site can generate more than 50TB of clickstream data each day. If the company’s business analysts want a 90-day view of this activity, they must gather and store nearly a petabyte of data. But storage is only an enabling requirement. Because company leaders need to derive actionable insight from the data, users also demand the ability to apply business analytics to it. For this company, and many like it in a wide range of industries, the cost associated with achieving business analytics on this scale had placed the goal far out of reach.
But now, evolutionary advances in data warehouse appliance platforms and data storage technologies have made it possible for companies with even limited budgets to capture, store and—most importantly—analyze vast amounts of structured and unstructured data. What’s more, by using a specialized large-volume appliance, this can now be done for an order of magnitude lower cost per terabyte than with a classic enterprise-wide data warehouse. So organizations can now realize new strategic insight on massive amounts of data with a new affordable investment.
The value of large-data analysis

Click to enlarge
Much of this high-volume data by its very nature is less valuable on a per-unit-of-storage basis than traditional enterprise data warehouse (EDW) data. Its value lies in the new behavior or operational business insight that these high-volume data types can provide with deep analytics performed by power users, for example. An EDW, on the other hand, is focused on deriving enormous value from the integration of cross-functional business data sets accessed by enterprise-wide users performing both operational and strategic mixed workloads. Interestingly, the EDW will often be the source for the long-term accumulation of the high-volume data. (See figure.)
Now that affordable large-volume appliances exist that are designed specifically to analyze extreme amounts of data, companies are discovering new perspectives as they recognize their increased potential to:
- Gain unique insight into customer behaviors and consumer trends
- Identify operational inefficiencies and process optimization opportunities on both a macro and micro level
- Pinpoint abnormal behaviors and subtle security gaps
Industry examples
Any business that generates extreme amounts of detailed information within its operation is a good candidate for a large-volume data warehouse. As examples, here are three industries in which companies can use large-volume data analysis to improve their products and boost their bottom line:
Web enterprise
Web-based businesses in the Web 2.0 world can be viewed as information creators and brokers. Combined, the amount of data about each click, rollover, search parameter or page view time becomes immense when multiplied by the thousands of visitors a site could have on a typical busy day. This data has tremendous value if actionable information can be unlocked. A Web-based enterprise applying deep analysis to these large data volumes can easily answer previously impenetrable questions, such as:
- Which customers almost purchased an item from the Web site? What items did they browse without buying? What did they do instead? Did they change their mind? Did they buy a similar item on the site?
- How will small or large revisions to the user interface or ad placement change consumer behavior over a 60-day period?
- What short- and long-term visitor activities affect the site’s sales figures?
The intelligence gained through deep analysis of large-volume data can also provide extended value to traditional retailers that support a Web-based business. Combining the analysis from the Web data with sales from other customer channels—such as the store and call center—provides a complete, integrated view of the entire customer experience. The retailer can now optimize the advertising spending and increase the effectiveness of marketing campaigns by personalizing them to each customer’s primary channels.
Manufacturing
Companies that produce complex products generate vast amounts of data every day. Each validation test and stage in the manufacturing cycle creates valuable data that has typically been discarded or highly summarized after it served its short-term purpose. For instance, jet engine production can generate more than 3TB of data per day. Six months’ worth of this data requires more than half a petabyte of online storage, so most manufacturers keep only a sample set or don’t keep the data at all. But a group of engineers can gain tremendous actionable insights from this historical data, such as accurate trending on part failure and maintenance repair.
Saving these myriad data sets and analyzing them over time can allow companies to:
- Improve the accuracy and efficiency of the manufacturing process
- Increase product yields for greater profitability
- Reduce rework and retesting efforts by improving the quality of the product
- Increase data transparency for regulatory compliance
Telecommunications
A call detail record (CDR) contains complete information describing an event or transaction in a telco’s network, including each call on a landline or wireless phone. As the telco’s primary source of information for customer-experience and network-performance information, CDRs provide information on multiple kinds of traffic, including voice, Internet, switch and signaling data. Cumulatively, this information can add up to billions of records. Valuable information about customer behavior and network performance can be gleaned from the analysis of this all-encompassing data. However, with each CDR entailing more than 400 bytes, multiple terabytes of storage capacity would be required per day to save this data for analysis.
While many telcos already store large volumes of their CDR data offline for regulatory and compliance reasons, it has historically been difficult to access it for analysis. Using a large-volume data warehouse enables these companies to gain insight and profit from this previously underutilized resource.
Large-volume data analysis can enable telcos to:
- Analyze and optimize network utilization trends over time
- Perform settlement assurance tasks for interconnection agreements and other contractual items
- Ensure regulatory transparency and reporting compliance
- Analyze customer behaviors and trends to achieve greater sales and marketing perspective
Many other industries also stand to reap the benefits of using a large-volume data warehouse:
- Insurance companies can more accurately assess risk for optimum pricing by analyzing many years of claims data.
- Biotech firms can compare and analyze the results of millions of experiments to help optimize medicines and other products.
- Airlines can collect and analyze complete, detailed profiles for every second of every flight to maximize safety and optimize fuel usage.
Any business that generates extreme amounts of detailed information within its operation is a good candidate for a large-volume data warehouse.
A historic opportunity
By combining high-volume storage capabilities with the analytical power of a data warehouse, companies can now cost-effectively conduct deep analysis on vast amounts of data for strategic gain, operational optimization and regulatory compliance.
Information that could never before be practically preserved and analyzed can now be harnessed. This enables company leaders to learn more about their operations and their customers, to fine-tune their organizational processes and to make sound strategic decisions for the future.
James Dietz, platform marketing manager, has been with Teradata more than 14 years.
Photography by Shutterstock