Grand master of the database
The Teradata Optimizer determines the best options for query processing.
The parallel architecture, the foundation of the Teradata Database, is what supports the enterprise data warehouse (EDW). Spreading data evenly over a large number of AMPs, or units of parallelism, makes the Teradata Database fast and powerful. Leveraging the power inherent in this architecture, the Teradata Optimizer is the brain that determines the most intelligent and efficient way to execute each of the unlimited variety of SQL queries.
All but the simplest SQL requests can be processed using various methods. Each will produce the same answer, but they will use different amounts of system resources. Like a chess grand master who projects out many moves and quickly recognizes the best strategy, the Optimizer systematically evaluates query execution plans and selects the optimal approach. It then builds the plan the AMPs follow to execute the query.
Cost-based optimization and statistics
The Teradata Database has a cost-based Optimizer. This means that when determining the best method to process each pending query, the Optimizer evaluates various execution plans, then estimates the “cost,” or amount of resources that will be consumed, for each option. It selects a low-cost plan that enables the system to use as few resources as possible to maximize system throughput.
The Optimizer doesn’t perform a detailed evaluation of every possible query plan—a 10-way join has more than 17 billion processing possibilities. Instead, it uses sophisticated algorithms to identify and select the most promising candidates for detailed evaluation, then picks what it perceives as the best plan among those.
The costing performed by the Optimizer includes:
- Reading data rows, data blocks, index rows and index data blocks
- Calculating expressions
- Writing and sorting data rows
As part of the costing process, the Optimizer identifies the best logical steps to solve the query as well as the physical means to carry out each step. Processor (CPU), disk I/O and interconnect are among the resources it attempts to conserve in building the query plan. The options listed in the table are among the components considered.
The relative costing, or best execution plan, will depend on the hardware configuration. Some systems have high I/O costs, while others have relatively higher CPU costs. To ensure an efficient query plan, the Optimizer relies on information about the system configuration (e.g., number of CPUs and their power, I/O bandwidth, number of database AMPs) as well as the data currently stored in the database. It then uses a set of coefficients with information about each Teradata platform and storage model to calibrate the configuration calculations.
Database administrators (DBAs) issue commands to collect detailed statistics on critical columns within the database. These will provide the Optimizer with additional input about data demographics, such as the number of distinct values, total rows in the table and more. If detailed statistics have not been collected for tables being accessed by a query, the Optimizer implements sampling techniques to provide a glimpse of key table characteristics.
Logical step planning
Determining the order in which multiple tables will be joined is the most important job of the Optimizer. It can then make adjustments to how the query is processed internally. This series of techniques referred to as query rewrites creates a more efficient, but equivalent, execution plan.
For instance, outer joins can be converted to inner joins, and what looks like a single operation in the original SQL query can be broken into multiple steps and performed at different points in the execution plan. An example of this type of query rewrite is the Partial Group By feature, in which the process of grouping data can be divided so that a portion can be done early in the execution and the rest done later.
Access path and join planning
Establishing the access path is usually done in combination with join planning. The access path describes how to retrieve the data needed to satisfy the query. For example, rows in a table may be available either through a full table scan or by using indexed access.
Types of indexes that can be defined in the data warehouse from Teradata include:
- Unique or non-unique primary index (PI)
- Unique or non-unique secondary index
- Partitioned PI
- Join index
The indexes created by the DBAs will be evaluated by the Optimizer for use in accessing the data.
During join planning, the optimal join method, geography and order are determined, as well as whether an existing join index can be used to satisfy part of the query. Join mechanics:
- Perform projections at each step so that only the columns that are necessary for the next step are carried forward
- Filter the data as early as possible so only the needed tuples (the ordered sets of data constituting a record) are carried forward
- Minimize the size of spool files by reducing the number of rows and columns carried forward
Click to enlarge
In this stage of planning, the Optimizer estimates the join selectivity, or how many of the rows from the tables will satisfy the join conditions. This estimation is often improved if good statistics have been collected. The Optimizer also uses table-level statistics to adjust calculated join costs based on the skew expected to occur during query execution.
Aggregations within a query can require data redistribution among the AMPs during processing. Because of this, join and aggregation costing are integrated within the Optimizer so that after taking into consideration any redistribution requirements, the best overall query execution plan is generated. Consequently, what would seem to be a sub-optimal join geography may actually be the best choice if it reduces total data redistribution for both join and aggregate processing.
The Teradata Optimizer is the brain that determines the most intelligent and efficient way to execute each of the unlimited variety of SQL queries.
A winning strategy
An optimizer is a critical part of any relational database system. It determines how efficiently and effectively the database will process SQL requests.
Teradata Optimizer gives the Teradata Database the intelligence to take advantage of its native architecture and available system resources. Benefiting from years of tuning and refinement focused on data warehouse workloads, the Optimizer delivers effective and efficient query performance, just as a practicing novice hopes to one day become the grand master of the chessboard.