Keith Muller, Teradata Fellow and lead architect for Teradata platforms

Keith Muller, Teradata Fellow and lead architect for Teradata platforms


Ask the Experts

Hot and Cold Running Data

Hybrid platforms are the next step in the data storage evolution.

The evolution of storage, hardware platforms and database technologies has made it easier for data warehouses to maintain vast historical records, or “cold” data, that can be mined for business insights. At the same time, these technologies are enabling high-performance data warehouses for near real-time and accelerated analytics on “hot” data. What hasn’t been resolved until recently is how to make those data storage methods both cost-effective and nimble enough to allow the kind of multi-temperature data storage and access that many organizations crave.

The future points to a hybrid data ware­housing platform that promises the capacity and cost benefits of hard disk drives (HDDs) while leveraging the performance advan­tage of solid-state drives (SSDs). Teradata Magazine asked Keith Muller, Teradata Fellow and lead architect for Teradata platforms, to explain why data storage will no longer be just a capacity-versus-cost proposition.

Why is data storage such a hot topic these days?

Businesses are moving faster than ever and dealing with vastly greater data vol­umes than just five years ago. At the same time, we’ve seen huge leaps in business intelligence [BI] technology that make it possible to not only get more out of real-time data but also make the most of longer-term data. How you store and access data is no longer a simple matter.

Traditional HDD technology is cost-effective and offers plenty of capacity, yet it can’t keep up with the data warehouse performance requirements for real-time data analysis. On the other hand, SSDs, which offer extreme per­formance, aren’t the most cost-effective storage solution for large volumes of historical data.

"Hybrid data storage platforms will leverage the best attributes of SSD and HDD technologies. … You can really optimize cost, capacity and performance."

—Keith Muller,
Teradata Fellow and lead architect for Teradata platforms

The value of data and how it’s used often change over time. Instead of forcing every­thing into the highest-cost/best-performance solution just to ensure you always have the agility to make quick decisions, why not automatically store the data according to its temperature? Hot data that is frequently accessed can live on high-performance drives, while cold data can live on cost-effective, high-capacity drives.

This way, you aren’t under pressure to manually purge or move data to maintain performance. You can keep the details, handle the bigger, cooler data sets as needed and still have the performance you need for hot data. That’s the ideal many companies are striving to achieve.

You mentioned SSDs and HDDs. What are the primary advantages of each?

With hard drives, which are serial devices that handle one input/output [I/O] at a time, the primary driver over the past genera­tion has been lower cost per gigabyte. We’re seeing larger-capacity disk drives, but their performance is barely improving and hasn’t kept up with CPU technology. To match CPU performance, which is driven by parallelism, you’d have to use a larger number of higher-capacity HDDs per server.

The response to this trend is the SSD, which is a parallel memory drive capable of handling multiple queries simultaneously, delivering a dramatically higher level of bandwidth and latency performance versus capacity. HDDs offer a cost-capacity advantage over SSDs, so they’re ideal for storing large data volumes that don’t require fast retrieval for processing. Think of the data required for regulatory compliance in the financial indus­try, for example. You have to keep it even though it’s not usually required for regular reporting or analysis.

As more organizations move to real-time analytics, however, the demand for the lower-latency performance provided by SSDs has increased dramatically. We developed the SSD-based Teradata Extreme Performance Appliance, the industry’s first all-SSD data warehouse, to handle very high-performance workloads on high-value business data. Its “so fast it’s a blur” performance is ideal for the kind of processing that airline reservation systems or online retailers require.

Until now, there’s been a trade-off between cost and performance. You can gain perfor­mance by adding more HDDs, but you end up with excess capacity. Alternatively, you can boost performance with SSDs, which have a higher cost-capacity ratio.

Hybrid storage solutions seem to solve this dilemma by offering the best of both worlds.

That’s right. Hybrid data storage platforms will leverage the best attributes of SSD and HDD technologies. Data is stored on the appropriate drive depending on business requirements, and it can be automatically migrated as requirements change. You can really optimize cost, capacity and perfor­mance with a hybrid system.

If a customer regularly accesses just 25% of its data, for example, then a hybrid system can store that 25% on the fastest device. The rest of the data can be migrated toward lower-performance devices with a better cost per gigabyte.

It’s important to note that data tempera­ture can change over time, so to work well, a hybrid system has to be able to respond to those changes. Balance is an important part of this equation—you can get the best of both worlds with this model as long as it’s appro­priately applied to the way the data is used.

Teradata is developing a hybrid solution. How will it address this balance?

The key to this solution is the automated data migration provided by Teradata Virtual Storage. It considers many factors—data lifetime, data access frequency, data type and so on—automatically and continuously opti­mizes the data placement onto the appropriate storage device: SSD or the areas of the HDD that match the data’s temperature. [See figure.]

Figure: Hybrid Storage

Click to enlarge

By handling storage allocation in the software rather than inside the storage device, we can take advantage of knowing what data is being stored inside a storage block and exactly how that storage will be used over time in order to optimize per­formance. Everything is transparent to the user since the data automatically migrates to the appropriate storage devices as it cools or heats. It’s not something you have to do manually; your access patterns automati­cally trigger the necessary changes.

What types of companies can benefit from a hybrid solution?

A hybrid solution can deliver a significant performance improvement without increas­ing data center footprints, breaking the budget or sacrificing energy efficiency. For companies that currently store data in multiple systems due to cost/performance challenges, a hybrid solution gives them the ability to store all of their data in one system. Subsequently, storage costs and footprint concerns will be much less of a deciding fac­tor in how you manage the data warehouse or develop BI initiatives.

This type of solution will be ideal for certain industries. Financial firms, for example, manage extremely hot data like ATM transactions, securities transactions, event-based offers and much more. At the same time, regulatory demands require that they maintain years’ worth of detailed histori­cal records. Huge volumes of rarely accessed historical data can be too costly to maintain in a performance-oriented system. A hybrid solution could offer the required performance for the hot data yet provide a cost-effective solution for storing the regulatory data.

Retailers also strive for balance between near-term tactical decision making and long-term strategic analysis, as do airlines, transportation providers and governmental agencies—the list goes on and on. Just about any organization in any industry will benefit from the cost and performance advantages delivered by a hybrid storage platform.

Your Comment:
Your Rating:

Very informative article.

9/2/2012 12:00:26 PM
— Anonymous
Fuzzy Logix