Applied Solutions

Too Much of the Same?

Data deduplication means less storage, lower costs and better disaster recovery.

Backing up a full database every day or even every week can require an immense amount of storage. When planning a robust backup strategy there are many factors to consider:

  • Storing multiple versions of data can be cost-prohibitive, depending on how many copies are required or the price of your network bandwidth for site-to-site replication.
  • Most data doesn’t change daily, so tables that haven’t been modified for days, or longer, land in the backup environment over and over again.
  • Replicating full backups to a disaster recovery site doubles the amount of storage required.
  • Copying data to a second site can also be time consuming, depending on the size of the backups and the speed of the wide area network.

Now the good news: Deduplication devices are becoming popular for compressing and reducing the amount of data stored. Plus, they reduce network bandwidth requirements for disaster recovery replication.

Less Storage, Less Expense

Deduplication technology segments the incoming data stream, uniquely identifies those segments, and then compares them with stored data. If an incoming segment is a dupli­cate, it’s not stored again. Instead, a reference to it is created. But if the segment is unique, it is stored on disk. The result can be a significant reduction in storage capacity requirements, which has economic benefits.

Several variations of deduplication are available:

  • Variable block sizes create a floating boundary within the data stream so dupli­cate blocks are easily identified, even if one portion of the stream has changed.
  • With a fixed block size, one small change moves the block boundary and then all blocks look different to the deduplication device, even if the underlying data is mostly the same.
  • Inline deduplication occurs in real time, prior to writing data to the stor­age device, so only the deduplicated data lands in storage.
  • With post-processing deduplication, each full backup is stored and then ana­lyzed before the data reduction occurs, resulting in the need for more storage and longer time for completion.

Once data is deduplicated, replicating it to a disaster recovery site takes less time for quicker disaster-recovery readiness. Only deduplicated and compressed changes are transferred across the IP network, requir­ing a fraction of the bandwidth, time and costs associated with traditional replication methods. In addition, some vendors such as EMC use advanced technologies that improve data verification, integrity, system throughput and scalability.

Advanced Technology

Teradata offers the EMC Data Domain DD890 system as part of its advocated backup/archive/restore (BAR) solution. This deduplication appliance with integrated hardware and software uses variable block sizes for optimal in-line deduplication. Teradata customers receive a tightly integrated and supported backup solution with planning, implementation, mentoring and first level support. Key features of the Data Domain system include:

  • Data Domain Boost software to distribute parts of the deduplication process across multiple data servers for faster performance and reduced local area network bandwidth requirements. Data Domain Boost is available only for Symantec NetBackup users and can enable up to 50% faster backups than systems running other backup applications.
  • Data Domain Replicator software for asynchronously transferring only encrypted, compressed, deduplicated data over the wide area network for cost-effective, fast, reliable replication.
  • Data Domain Encryption software to protect backup and archived data stored on Data Domain systems with inline encryption algorithms.
  • Flexible interface options, which offer simultaneous use of virtual tape library, network-attached storage and Data Domain Boost.

Make the Connection

The Teradata Database evenly distributes data tables across the node and storage infrastructure. Optimal data protection performance can be achieved with a solution that takes advantage of the parallel nature of the database architecture to back up data in the most efficient manner.

The backup hardware infrastructure is fairly traditional, with an Ethernet network con­necting all Teradata Database nodes to media servers, which in turn connect to the target storage. Communication and data movement between the database, third-party backup applications and target storage is enabled by the Teradata Extension software plug-in.

Performance Considerations

The performance rates of deduplication appliances, including the EMC Data Domain platform, are highly variable depending on the interface, number of hard drives, compression/deduplication rates, and database characteristics such as change rate and table type. The number of media servers also impacts the overall performance capabilities. This is especially true when using Data Domain Boost software, since it distributes the processing of deduplication upstream to the available CPUs in the media server or servers. Deduplication rates are also highly variable. Generally, Teradata Database backups achieve a total data reduction rate of between eight and 12 times, after the first full backup.

Teradata recommends Symantec NetBackup for use with solutions that include Data Domain storage due to the tight integration with Data Domain Boost software. This integration provides increased performance opportunities with distributed deduplication in addition to site-to-site replication managed within NetBackup. Virtual tape library or network file system users will experience approximately half the performance of an optimally configured Data Domain Boost system.

Valuable Data Protection

Data deduplication is a technology that can significantly reduce backup costs for storage and networking while increasing the number of copies companies can store. It also improves disaster recovery readiness. EMC Data Domain systems, now available from Teradata, enable organizations to more effectively protect their valuable data.

Your Comment:
Your Rating:

Fuzzy Logix