Loading...
image

Carrie Ballinger: Senior database analyst, Teradata Certified Master

Tech2Tech

Tech support

In code we trust

Teradata Certified Master answers readers’ technical questions

On May 24, 1844, Samuel Morse defied skeptics by sending a single-wire telegraph message from Washington, D.C., to Baltimore. This marked the birth of electronic communication, which has grown in the last century-and-a-half to become an essential part of our everyday lives.

For example, this column uses e-mail—a sophisticated descendant of the telegraph—to efficiently exchange questions and answers. And, you might be reading this column on the Web, which builds further on Morse’s invention. Here are some technical questions that, no doubt, never even crossed Morse’s mind.

NUPI access of a PPI table

Dear Carrie:

In the September 2009 issue of Teradata Magazine, you wrote:

Single-AMP PI [primary index] access will be degraded significantly if the PI of a PPI [partitioned PI] table is defined as a non-unique PI (NUPI). Unless the PI includes a partitioning column, it is required to be non-unique.

Does this mean I need to change all of my PPI tables to include the partitioning column as part of the PI?

—Starting Over

Dear Starting:

For many PPI tables, it doesn’t make sense to include the partitioning column as part of the PI definition, and it is not required that you change your current PIs to make that happen. Let me explain the conditions under which this degradation occurs, then I will offer options to minimize it.

Uniqueness must be enforced when a new row is inserted into any table defined with a unique PI (UPI). The system does this by checking the other rows already in the database that could carry that same PI value. For a standard table, this check is very quick. But with a PPI table all partitions would have to be examined unless the partitioning columns were part of the PI definition.

The reason all partitions would have to be examined to check for UPI uniqueness (if you were allowed to have a UPI with no partitioning columns) has to do with how the rows of a PPI table are ordered on disk: first by partition, second by hash code. Because any one of the partitions could be holding a row with a duplicate UPI value that matches the row being inserted, all partitions would have to be searched.

image

Click to enlarge

It was determined in the early stages of developing PPI that the overhead of doing UPI uniqueness checking across multiple partitions could slow down the insert process. Thus UPIs were allowed on a PPI table only if the PI included the partitioning column. That meant only one partition needed to be checked for uniqueness.

A trade-off of this decision is that, under some conditions, NUPI access of a PPI table can be degraded. Every time NUPI access is performed, each partition has to be probed to see whether it carries one or more rows with the desired PI value, even if that value has already been found within another partition.

This probing is a single-AMP operation and won’t take long if few partitions—say, 100 or fewer—are defined. Because only a single task performs the probe of each partition on that AMP, probing hundreds or thousands of partitions will slow down the query doing NUPI access.(See figure 1.)

Here are a few techniques that can reduce the probing overhead:

  • Include WHERE clause criteria that passes a single partitioning column value.
  • Define a unique secondary index (USI) on the PI columns if the NUPI column values are physically distinct.
  • Define a non-unique secondary index (NUSI) on the PI columns. This is allowed only for PPI tables, and for this index to be used, the NUSI must be referenced with equality conditions in the query. When used on a PPI table, this type of NUSI will provide AMP-local access and row-hash level locking, rather than the usual all-AMP access that comes with standard NUSI access.

Keep in mind that both the USI and NUSI options will require some additional space for storage. They may also add some overhead when the table is maintained.

Even if you do not have a low number of partitions, test the PPI tables in which you think you will need to do NUPI access to determine the amount of additional NUPI overhead. If PI access is infrequent, and its response time is not critical, it may be fine to leave your index definitions as they are.

Waiting for an AMP worker task

Dear Carrie:

We sometimes run out of AMP worker tasks (AWTs). Recently I’ve noticed that while some queries are waiting for an AWT, other, newer queries are able to get an AWT and begin processing. Is something broken in our system?

Unfair Treatment

Dear Unfair:

image

Click to enlarge

If no AWTs are available, work messages that arrive from the dispatcher on the AMPs are temporarily held in a message queue. Each AMP has one message queue, which, unlike the query delay queue, does not function on a first-in, first-out basis. Consequently, messages waiting in the queue are not treated equally.

When multiple messages are waiting, the request that gets the next AWT will depend on where the message sits on the queue. A message represents one step in a query request. The ordering on the queue is by:

  1. Work type, in reverse sequence (WorkNew is always last)
  2. Priority scheduler relative weight within each work type

This means that messages representing low-priority work will wait longer for an AWT than higher-priority work messages, and work that is starting a new step (WorkNew work type) may wait longer than messages of the same priority that represent spawned work (WorkOne work type). Figure 2 illustrates this sequencing.

The ordering was designed to give work that has already begun and work that is more important an advantage in acquiring AWTs. So it is possible that when there is a shortage of AMPs, some messages will wait longer for an AWT.

Optimizing a query when the dictionary cache is being flushed

Dear Carrie:

When we run a CREATE TABLE or other similar data definition language (DDL), the system sets a flag to spoil the parsing engine (PE) cache. This means the PE is forced to refresh the potentially changed dictionary data. This is all good.

Now, optimizing a complicated query that parses into hundreds of steps might take many seconds. When the dictionary cache is flushed during this parsing and optimizing stage, does the complete parsing/optimizing process have to start over? Or does the parsing and optimizing of the complex query continue even though the dictionary data has changed—which could potentially make the already optimized steps invalid?

Parser, Interrupted

Dear Parser:

I’ll start with a little background on PEs. Each one has a data dictionary cache that holds information on what has recently been used by parsed requests on that PE. This includes information about columns, tables and table headers, views, statistics and so on.

Certain commands, such as COLLECT STATISTICS, cause data structures in these dictionary caches to be removed (i.e., flushed or spoiled) for specific tables that might be affected by the data definition command.

SIDEBAR: Did you know?

Spoiling or flushing the cache: The process of automatically removing query plans and other demographic data from the parsing engine caches when a database object is changed or when its statistics are refreshed.

However, once the Optimizer starts to build a plan, it never starts over. If demographics are changed during this phase, the PE might not pick up the new information, so it’s possible that the query execution plan originally evaluated to be the best approach might end up being less than optimal. For example, newly collected statistics won’t be available for in-flight optimizations, and these new numbers could affect which plan is selected by the Optimizer. However, queries that begin optimization after the cache has been flushed at the end of the COLLECT STATISTICS statement will include all of the current information.

You mentioned a CREATE TABLE, but that command will not remove entries from the dictionary cache because nothing in the cache existed for that table at the time it was created. Set-level updates, like INSERT/SELECT, will cause the dictionary cache to be flushed if the estimates show that 10% or more of the data is being changed. ALTER TABLE will also cause a flush. But none of these commands will cause current optimization processes to begin all over again.

Here is an example of a COLLECT STATISTICS command for a client table. At the completion of the statistics collection process, which includes updating the dictionary tables with this new information, the dictionary cache on each PE will be flushed. This is shown in step 7 below.

During this process, any data contained in the dictionary cache that is related to the client table will be removed from the cache. The next Parser or Optimizer effort on behalf of a query that requires information about this table will access the updated version in the data dictionary.

EXPLAIN

COLLECT STATISTICS ON client COLUMN c_segment;

Explanation

  1. First, we lock CAB.client for access.
  2. Next, we do a COLLECT STATISTICS step from CAB.client by way of an all-rows scan into Spool 3 (all_amps), which is built locally on the AMPs.
  3. Then we save the UPDATED STATISTICS from Spool 3 (Last Use) into Spool 4, which is built locally on the AMP derived from DBC.TVFields by way of the primary index.
  4. We lock DBC.TVFields for write on a RowHash.
  5. We do a single-AMP MERGE DELETE to DBC.TVFields from Spool 4 (Last Use) by way of a RowHash match scan. New updated rows are built and the result goes into Spool 5 (one-AMP), which is built locally on the AMPs.
  6. We do a single-AMP MERGE into DBC.TVFields from Spool 5 (Last Use).
  7. We spoil the Parser’s dictionary cache for the table.
  8. Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request.

Although step 7 states that only the dictionary cache is affected, both the dictionary and the request cache are flushed on each PE.

TotalIOCount

Dear Carrie:

What is included in the Database Query Log (DBQL) column “TotalIOCount”?

—Don’t Count Me Out

Dear Don’t:

As seen from the AMPs, the TotalIOCount field represents logical I/O issued on behalf of this query. It includes I/O used to scan tables, read indexes or cylinder indexes, and read and write spool files. Although it accounts for most of the I/O performed by the query, TotalIOCount does not capture every I/O for which the query is responsible. Some of the sources of I/O that TotalIOCount does not cover are:

  • I/O that is required during parsing or optimizing
  • BYNET I/O or swap I/O
  • I/O resulting from aborted queries

Because I/Os can be missed in TotalIOCount, it is possible for the physical I/Os that appear in System Activity Reporter (a standard operating system utility) or ResUsage reports to be higher than the sum of TotalIOCount values captured in DBQL. If you need more accurate logical I/O counts at a level above the query—for example, to get information on an application for an entire day—consider using AmpUsage, which provides total usage numbers.

However, TotalIOCount gives a good estimate of how much one query requests an I/O as compared with other queries. Besides being nearly accurate, it is well suited for query-by-query analysis, especially for identifying queries that are consuming high levels of I/O in ratio to CPU.

Merge versus exclusion joins

Dear Carrie:

I’m reviewing some explains on Teradata 12. Within the same explain plan, some join steps are identified as a “merge join” and other steps are identified as an “inclusion merge join.” I assume the merge join is a category of an inclusion merge join, but I’m wondering if there is a difference in how the merge join and inclusion merge join perform.

—Joining In

Dear Joining:

A merge join is not the same as an inclusion merge join because they are executed differently. Simply speaking, a merge join corresponds to an inner join, while an inclusion join has an IN operator (or, sometimes, an EXISTS operator) associated with a sub-query. The exclusion join, another type, is the opposite of an inclusion join, and corresponds to a NOT IN operator (or, sometimes, a NOT EXISTS operator) and a sub-query.

SIDEBAR: Learn More

Find more insight to other technical points of interest in Carrie’s blog on Teradata Developer Exchange at Developer.Teradata.com

A merge join links matching rows across two tables. An inclusion join links a specific row from the outer table with the first matching row from the inner table. Sometimes it doesn’t actually do a join at all, but rather returns the row from the outer table if at least one row matches to the inner table. So an inclusion join may require fewer resources than the merge join because once it finds the first match it stops looking for additional ones.

An exclusion join is used to find rows that do not have a matching row in the inner table. If there are no matching rows, it has to try to join all rows in the opposing table to be sure of that. On the other hand, it can quit looking for additional matching rows as soon as it finds one matching row. If it finds a match, the outer table row it is trying to match to will not be included in the result set.


Your Comment:
  
Your Rating:

Comments