Too Much of the Same?
Data deduplication means less storage, lower costs and better disaster recovery.
Backing up a full database every day or even every week can require an immense amount of storage. When planning a robust backup strategy there are many factors to consider:
- Storing multiple versions of data can be cost-prohibitive, depending on how many copies are required or the price of your network bandwidth for site-to-site replication.
- Most data doesn’t change daily, so tables that haven’t been modified for days, or longer, land in the backup environment over and over again.
- Replicating full backups to a disaster recovery site doubles the amount of storage required.
- Copying data to a second site can also be time consuming, depending on the size of the backups and the speed of the wide area network.
Now the good news: Deduplication devices are becoming popular for compressing and reducing the amount of data stored. Plus, they reduce network bandwidth requirements for disaster recovery replication.
Less Storage, Less Expense
Deduplication technology segments the incoming data stream, uniquely identifies those segments, and then compares them with stored data. If an incoming segment is a duplicate, it’s not stored again. Instead, a reference to it is created. But if the segment is unique, it is stored on disk. The result can be a significant reduction in storage capacity requirements, which has economic benefits.
Several variations of deduplication are available:
- Variable block sizes create a floating boundary within the data stream so duplicate blocks are easily identified, even if one portion of the stream has changed.
- With a fixed block size, one small change moves the block boundary and then all blocks look different to the deduplication device, even if the underlying data is mostly the same.
- Inline deduplication occurs in real time, prior to writing data to the storage device, so only the deduplicated data lands in storage.
- With post-processing deduplication, each full backup is stored and then analyzed before the data reduction occurs, resulting in the need for more storage and longer time for completion.
Once data is deduplicated, replicating it to a disaster recovery site takes less time for quicker disaster-recovery readiness. Only deduplicated and compressed changes are transferred across the IP network, requiring a fraction of the bandwidth, time and costs associated with traditional replication methods. In addition, some vendors such as EMC use advanced technologies that improve data verification, integrity, system throughput and scalability.
Make the Connection
The Teradata Database evenly distributes data tables across the node and storage infrastructure. Optimal data protection performance can be achieved with a solution that takes advantage of the parallel nature of the database architecture to back up data in the most efficient manner.
The backup hardware infrastructure is fairly traditional, with an Ethernet network connecting all Teradata Database nodes to media servers, which in turn connect to the target storage. Communication and data movement between the database, third-party backup applications and target storage is enabled by the Teradata Extension software plug-in.
The performance rates of deduplication appliances, including the EMC Data Domain platform, are highly variable depending on the interface, number of hard drives, compression/deduplication rates, and database characteristics such as change rate and table type. The number of media servers also impacts the overall performance capabilities. This is especially true when using Data Domain Boost software, since it distributes the processing of deduplication upstream to the available CPUs in the media server or servers. Deduplication rates are also highly variable. Generally, Teradata Database backups achieve a total data reduction rate of between eight and 12 times, after the first full backup.
Teradata recommends Symantec NetBackup for use with solutions that include Data Domain storage due to the tight integration with Data Domain Boost software. This integration provides increased performance opportunities with distributed deduplication in addition to site-to-site replication managed within NetBackup. Virtual tape library or network file system users will experience approximately half the performance of an optimally configured Data Domain Boost system.
Valuable Data Protection
Data deduplication is a technology that can significantly reduce backup costs for storage and networking while increasing the number of copies companies can store. It also improves disaster recovery readiness. EMC Data Domain systems, now available from Teradata, enable organizations to more effectively protect their valuable data.