Loading...
Support

Carrie Ballinger, Senior database analyst, Teradata Certified Master

Tech2Tech

Tech Support

Funny business

A Teradata Certified Master answers readers’ technical questions.

I like to laugh. I always feel a smile coming on when I hear the expression “Laughter is the best medicine.” Not only does humor promote healing and offload stress, but I’ve found that it also helps people work together better. In fact, I believe a good laugh promotes creative thinking and helps shake up our minds and bodies. Just about everything goes better with a grin and a chuckle.

While this column—a collection of technical questions and answers—is not intended to be a comedy act, I hope you can detect some light-heartedness between the lines.

FirstStepTime and flow control

Dear Carrie:

When an AMP is in flow control, new messages coming from the dispatcher are not received and must go through retry logic. Those messages must then be re-sent until they can be placed on the message queue.

We use FirstStepTime in the Database Query Log (DBQL) as a way to know when the first step of a query makes it to the AMPs. But what happens to the value in the time stamp if the AMP is in flow control and that message has to be retried several times?

—Retrying to Understand

Dear Retrying:

Messages coming into the AMP will be placed on a message queue if no AMP worker tasks (AWTs) are available to accommodate that work. Flow control is a temporary state that an AMP will enter into when its message queue reaches a specific length. This allows the AMP to take a breath and catch up with the current work before accepting additional work. Messages that cannot be accepted while the AMP is in flow control will be retried by the dispatcher, perhaps multiple times.

Support_figure1_tn

Click to enlarge

FirstStepTime is set in the dispatcher (the final module in the parsing engine) just before the first step of the query is dispatched to the AMPs. The time stamp value indicates when the request context data structure was created—FirstStepTime shows when the first step is sent, not when it is accepted on the message queue. (See figure 1.)

Queries or requests

Dear Carrie:

Does Teradata Active System Management handle requests or queries? All of the materials I’ve read on the product are about queries. Does this mean that estimates for all SQL statements within a single request are added together for the purposes of workload classification and actions? Or does each SQL query within a request get managed individually?

I’ve got tools that generate multi-statement requests (MSRs), and I don’t want those requests penalized compared with other single-statement requests entering the system, in terms of classification and exceptions.

—Requesting Clarity

Dear Requesting:

Teradata Active System Management is request-based, even though some of the documentation refers to queries. An MSR is treated as one request, meaning that all SQL statements in an MSR would be delayed as a single unit, be classified as a single unit or reach an exception as a single unit.

An MSR with several individual statements has always resulted in a single plan, and the request has always been treated as a single parsing unit and a single recovery unit. Teradata Active System Management builds on that precedent.

Classification is based on the optimized plan in the parsing engine (PE), and since at the point of classification there is no knowledge of separate SQL statements, the request arrives as a unit and is processed accordingly.

You are correct that an MSR containing several SQL statements could be classified differently than if those statements were submitted independently. Different classifications occur if the estimated processing time was part of the classification process. The Database Query Log (DBQL) can easily track the workload classification of a given request. If having a larger estimated time in the plan results in classification to a lower-priority workload, and that is problematic for you, look for a way to turn off the MSR option in the tool you are using.

Statistics needed for single-table access

Dear Carrie:

I have a query that retrieves records from a single table based on WHERE clause conditions. If we assume these conditions are based on primary index (PI) columns, non-indexed columns and partitioned primary index columns, does statistics collection help the query?

—Traveling Light

Dear Traveling:

If the access of a single table is by PI with an equality condition, statistics are not used and the Optimizer will choose a single-AMP operation, avoiding a table scan. If the table is accessed by PI with a range constraint, then a table scan will be processed and the Optimizer will rely on having PI statistics (if they exist) to produce a reasonable processing time and row count estimate. You are correct that the plan will not change based on those collected statistics.

Support_figure2_tn

Click to enlarge

However, the accuracy of these estimates might be important. Teradata Active System Management can use estimated processing time to classify queries to different workloads that have different underlying priorities. Even if the plan doesn’t change, more-accurate statistics can ensure that a short query will classify to a higher-priority workload than a long query. Differences in priority can translate to differences in execution time. (See figure 2.) Also, an object throttle option called “step time threshold” allows work to run immediately—and never be delayed—if all of the query steps have estimated processing times that are less than a threshold of seconds. (The threshold is user-defined.) This is a good technique to use when the throttle manages both short queries and long queries, and you don’t want the short work to be delayed. An accurate estimated processing time will help the throttle feature do its job.

More-accurate workload classification might also result if statistics are collected on the partitioning column and on PARTITION, even though those collections might not result in a plan change.

UNIONs in views

Dear Carrie:

My client has a non-Teradata system in which a UNION view combines several answer sets together from different tables. The view also has a literal defined in each SELECT statement that identifies the table name, and a WHERE condition that selects the literal in that view’s SELECT statement. When the UNION view is referenced in a query and the query contains a WHERE condition that specifies a single table name, the view will access only that table.

The customer wants to do the same thing in the Teradata Database. If a literal associated to one out of three tables in the view is present in the SQL, this process works. The plan shows the Optimizer locking and accessing only one of the three tables referenced in the view. What’s curious is that I cannot get this to work if I replace the numeric literals in the view with character literals. When I use characters, the Optimizer will do an all-AMPs retrieve on each of the three tables. I’ve tried various CAST statements on the character literals to make sure they are exactly the same data type in the view and in the query, but I still have no luck accessing only one table using character literals. Can you help me?

These are the view definition, SQL and explain text that show how I got it to work with numeric literals in the view:

REPLACE VIEW xxx AS

SELECT 1 AS tbl_id
,sess_nbr
FROM smmdb.session_hist
WHERE tbl_id = 1

UNION

SELECT 2 AS tbl_id
,sess_nbr
FROM smmdb.sql_hist
WHERE tbl_id = 2

UNION

select 3 AS tbl_id
,sess_nbr
FROM smmdb.usage_hist
WHERE tbl_id = 3;
;

EXPLAIN SELECT *
FROM xxx
WHERE tbl_id = 2;
;
  1. First, we lock a distinct mmdb.”pseudo table” for read on a RowHash to prevent global deadlock for smmdb.sql_hist.
  2. Next, we lock smmdb.sql_hist for read.
  3. We do an all-AMPs RETRIEVE step from smmdb.sql_hist by way of an all-rows scan with a condition of (“2 = 2”) into Spool 1 (all_amps), which is redistributed by hash code to all AMPs. Then we do a SORT to order Spool 1 by the sort key in spool field1 eliminating duplicate rows. The input table will not be cached in memory, but it is eligible for synchronized scanning. The result spool file will not be cached in memory. The size of Spool 1 is estimated with high confidence to be 58,786 rows. The estimated time for this step is 3.46 seconds.
  4. We do an all-AMPs RETRIEVE step from Spool 1 (Last Use) by way of an all-rows scan with a condition of (“TBL_ID = 2”) into Spool 2 (group_amps), which is built locally on the AMPs. The size of Spool 2 is estimated with high confidence to be 1 row. The estimated time for this step is 0.04 seconds.
  5. Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 2 are sent back to the user as the result of statement 1. The total estimated time is 3.50 seconds.

—View Crazed

Dear Crazed:

This UNION view is working as intended. You are unable to use character data types for SQL statement elimination until your system is updated with the Teradata 13 release.

The reason you can use only integers is based on how satisfiability and transitive closure functionality works prior to Teradata 13. When each SQL statement in the view is examined to see whether to process it, the database pushes down the “where tbl_id = x” that is carried in the query’s WHERE clause. Then the Optimizer sees whether the combination of predicates on each view statement conflict with each other.

If the query said “WHERE tbl_id = 3” and the first statement in the view said “WHERE tbl_id = 1,” then the pushed down/pushed together combination of predicates would say “WHERE tbl_id = 1 AND tbl_id = 3.” This combination would be identified as a conflict and thus unsatisfiable.

When a conflict is identified, that statement is skipped and the next statement in the view is tested until one or more are found that do not conflict after the pushdown. For example, “WHERE tbl_id = 3 AND tbl_id = 3” does not conflict, so that statement in the view would be processed.

Here are the results that you are looking for, taken from my Teradata 13 system:

REPLACE VIEW nationview AS

SELECT ‘one’ AS ID, n_nationkey
FROM nation
WHERE ID = ‘one’

UNION

SELECT ‘two’ AS ID, n_nationkey
from nation2
WHERE ID = ‘two’

UNION

SELECT ‘three’ AS ID, n_nationkey
from nation3
WHERE ID = ‘three’;
;

EXPLAIN SELECT * FROM nationview
WHERE ID = ‘two’;

This query is optimized using type 2 profile T2_Linux64, profileid 21.

  1. First, we lock a distinct CAB.”pseudo table” for read on a RowHash to prevent global deadlock for CAB.nation2.
  2. Next, we lock CAB.nation2 in view nationview for read.
  3. We do an all-AMPs RETRIEVE step from CAB.nation2 in view nationview by way of an all-rows scan with no residual conditions into Spool 1 (used to materialize view, derived table or table function nationview) (all_amps), which is redistributed by the hash code of (CAB.nation2.N_NATIONKEY, ‘two’) to all AMPs. The size of Spool 1 is estimated with low confidence to be 40 rows (1,480 bytes). The estimated time for this step is 0.01 seconds.
  4. We do an all-AMPs RETRIEVE step from Spool 1 (Last Use) by way of an all-rows scan with a condition of (“nationview.ID = ‘two’”) into Spool 2 (group_amps), which is built locally on the AMPs. The size of Spool 2 is estimated with low confidence to be 40 rows (1,120 bytes). The estimated time for this step is 0.01 seconds.
  5. Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 2 are sent back to the user as the result of statement 1. The total estimated time is 0.02 seconds.

What is the TIP table?

A transaction-in-process (TIP) table holds information about all of the current requests that are running on each AMP. It keeps track of information such as:

  • Host number
  • Spool, CPU and I/O use
  • Transient journal information
  • U User ID
  • Start and end times of the transaction

The Database Query Log and Teradata Performance Monitor get some of their information directly from the memory-resident TIP table. This table can be accessed only using special internal APIs.

If you see resource usage reported in Teradata Performance Monitor, then rest assured that workload management has visibility to the same detail.

—C.B.

User-defined functions and workload management

Dear Carrie:

My co-worker complained to me recently that a user-defined function (UDF) is not obeying our workload management rules. This UDF is getting a higher priority than it should. Demotions to a lower workload also don’t seem to happen for this UDF. Are UDFs treated differently than normal database processing by the workload rules we set?

—Functionally Literate

Dear Functionally:

UDFs are database objects that can extend the capability of normal SQL within the database. Once compiled, a UDF can be referenced in SQL statements for activities such as enforcing business rules or aiding in the transformation of data.

I’ve found a couple of things about UDFs that are specific to Teradata Active System Management:

  • A step with a UDF will show zero estimated processing time, which means classification based on estimated processing time might not produce the intended results.
  • While the UDF is executing, exception handling (or query milestones) is unaware of the accumulating CPU; however, after the UDF completes its processing for each row, the resource usage is visible and usable for exception handling.

A UDF will update the transaction-in-process (TIP) table on each AMP after the UDF has completed its processing on each row. As it finishes its work for a given row, the UDF makes an internal application programming interface (API) call in order to keep a running tally of CPU and I/O. This means that if each row in an answer set uses a second of CPU time while executing a UDF, that second of time will be attributed to the query and will be seen by workload management when the UDF relinquishes control for that row.

If the UDF is spending a lot of CPU time on one row, that time will not be visible until the UDF relinquishes control of the row and updates the internal database structures. So if a UDF spent an hour processing one row, or if the transaction was caught in an internal loop, workload management would not be able to detect the CPU usage until the base SQL query regains control of that row.

For queries that have UDFs with looping characteristics, you could use query banding to point them to specific workloads that run at a lower priority. You could also set up special views for queries with UDFs and classify those views to specific workloads.


Your Comment:
  
Your Rating:

Comments
 
can u tell me how to find out the free space availabile in each emp's

2/9/2011 7:56:12 AM
— Anonymous
 
How can we do union of resultsets in a view which contains many join and union need to be done in the last four joins? Thanks Saumyasanta Mohanty

10/21/2010 3:21:16 AM
— Anonymous