Rich Charucki, Teradata Engineering Fellow and Teradata Certified Master
Ask the Experts
Bold Steps Advance Data Warehousing
Teradata Database 14.0 introduces new features, setting the standard yet again.
With each new release, the Teradata Database reasserts its position as the industry-leading data warehousing database designed with maximum performance, availability and workload management.
Teradata Database 14.0 keeps up this tradition. It enhances and expands existing features while introducing game-changing functionality that promises to set the direction—and the bar—for the rest of the industry.
To learn more, Teradata Magazine spoke with Rich Charucki, a Teradata Engineering Fellow and a Teradata Certified Master.
The last two Teradata releases, versions 13.0 and 13.10, enabled new applications and data types with built-in geospatial and temporal support. What is the focus of Teradata 14.0?
Charucki: Teradata 14.0 is a well-rounded release that focuses on core data warehouse requirements, performance and mixed workload management. In addition to these fundamental areas, we’ve concentrated on developing a series of new industry-compatible functions, key features and capabilities that make it easier to migrate data from Oracle to Teradata.
There’s bound to be a lot of interest in that last area.
Charucki: Absolutely. These 38 new functions extend the capabilities of the Teradata Database. Applications can be seamlessly migrated and will maintain the same functionality as when they were executed on Oracle—except they’ll gain the performance advantages available in the Teradata Database.
Migrations will also be assisted through the use of new Oracle-compatible data types, such as NUMBER. In addition, we’ve made the NUMBER data type more efficient by providing support for variable-length storage of numeric data. So numeric data will only utilize the minimum amount of storage it needs to fit the number being stored. We have also made it possible to change the scale and precision of a NUMBER data type without any physical row modification.
All of these changes are key to simplifying Oracle-to-Teradata migrations.
What about increased performance in Teradata 14.0?
Charucki: The big performance star is Teradata Columnar, a new physical database design implementation option that allows sets of columns within tables to be stored in separate partitions. Tables can now be column-partitioned, row-partitioned or both. While this sounds simplistic, it’s actually cutting-edge and a competitive advantage for Teradata.
Teradata Columnar can improve query performance via column partition elimination, which reduces the need to access all of the columns in a row. In addition, row partition elimination does the same thing for the rows in a table. You’re basically breaking the data set into smaller pieces and only searching the pieces you need to in order to complete a query.
As part of Teradata Columnar, we are also introducing the new Columnar Auto-Compression capability. This lets you use any of six algorithms to perform compression on the columnar data automatically as the data is being loaded, thus reducing the data storage footprint for Teradata Columnar tables.
Are there other enhancements related to performance?
Charucki: Yes. Teradata Statistics now has its own dedicated parsing engine-level cache, which allows the caching of frequently used histograms and improves overall query optimization time. There’s also the added benefit of being able to control the size of the cache to take advantage of available memory.
Another enhancement was to increase the number of partitions allowed in a partitioned primary index [PPI] scheme from 65,000 partitions in a given table to a staggering 9.2 quintillion. This affords user queries more opportunity to achieve better performance by leveraging a greater degree of PPI partition elimination.
You mentioned Columnar Auto-Compression earlier. What else is new with compression in this release?
Charucki: In Teradata 14.0, we’ve added temperature-based block-level compression. This leverages Teradata Virtual Storage statistics to determine if a data block is hot or cold based on frequency of access. It can take a lot of CPU power to compress and decompress data, and users can experience a big performance hit if they accidentally compress hot data. Now the system can automatically identify and compress cold data, alleviating the burden of having to do so manually.
Also new is the independent Primary/ Fallback copy compression feature. Teradata Fallback keeps a mirror image of the data in two places on the system so if the primary copy of data is unavailable, you can get what you need from the fallback copy. However, this requires twice the data storage. With this new compression feature, users can compress the fallback data copy and decrease their storage footprint without adding decompression overhead to the normal use of the primary copy.
How is Teradata improving availability in this release?
Charucki: The new Active Fallback feature provides for the next level of fault isolation for disk read errors by not only recovering data blocks from a fallback copy, but also repairing the corrupted primary copy data block from the fallback copy on the fly. So now you can not only read but also write from fallback.
The benefit here is that the system automatically detects errors and self-corrects using the fallback copy, which improves availability by reducing system restarts and/or outages. In addition, the process is completely transparent to the end user. The system administrator gets a report that Active Fallback was instantiated, but the user experiences no interruptions.
Is Teradata continuing to focus on security?
Charucki: Yes. In fact, we have implemented Teradata Row Level Security as an option in this new release. Security, like compression, is a big topic and a customer concern. This falls under mandatory access controls [MAC], which permit data access based on definitions that can mandate constraints over something called discretionary access control [DAC]. Under DAC, users cannot be denied access to a table they own. But under MAC, users can be barred from accessing data in protected tables they own if they do not have proper clearance.
Teradata Row Level Security enhances our already strong table-level and column-level security capabilities. A system administrator can stipulate that users can only see information that’s relevant to them—that is, they can access the table but not all the rows in the table, even if they created the table. Teradata Row Level Security is a generalized implementation of MAC.
Teradata’s mixed workload management has been an industry-leading differentiator. How has it improved in Teradata 14.0?
Charucki: We’re leveraging some of the capabilities of the new version of Linux, SLES11, to support the mixed workload paradigm. This includes the management of user workloads through a set of hierarchical tiers and virtual partitioning, where you can allocate portions of the platform to specific user segments, such as departments or countries within a large corporation. Additionally, we have developed a deterministic Priority Scheduler, which makes sure everyone gets a fair share of resources.
It’s also important to point out that we can now manage I/O resources in addition to CPU resources. This feature is called I/O Prioritization. So using the previous example, departments or countries each get a percentage of disk access in addition to the overall CPU resources. If everyone uses the system at the same time, I/O Prioritization makes sure each group gets its fair share.
That doesn’t just soundlike an incremental improvement. It actually enables new capabilities.
Charucki: Prior to Teradata 14.0, there was no way for customers to do this; we could limit CPU but not the disk resources. So yes, this represents new capabilities that make it possible to partition system resources for departments, enforce resource sharing and funding models, et cetera. The under-lying scheduler in SLES 11 gives us the ability to provide better control and offer more capability.
Teradata 14.0 has clearly taken a bold step into the future of data warehousing.
Charucki: Indeed it has. The industry-compatible functions alone represent a huge time and cost savings for users who would otherwise have to create custom applications to support these functions and also handle NUMBER types, ARRAY types, row-level security and more.
In addition, the introduction of key new features like Teradata Columnar, Columnar Auto-Compression, industry-compatible functions, new data types and Active Fallback take the Teradata Database and platform to the next level in terms of performance, availability and security. Teradata Database 14.0 represents the next major step in the evolution of data warehousing excellence.