Senior technical consultant,
Teradata 14 Certified Master
Find the Original Source
A Teradata 14 Certified Master answers technical questions about block-level compression, Timeshare decay in SLES11, cylinder reads and temperature, and more.
Since I was a child, I’ve enjoyed discovering the origin of words. I’ve found that most established words in the English language have their own particular history and can be traced back to a definitive source. The same goes for the words we use in our industry everyday. For example, the word “algorithm” has Arabic roots, while “computer” comes from Latin. The word “technology” originated from the Greek words “tekhne,” meaning a craft, and “logia,” meaning systematic treatment.
It often helps when solving technical problems, such as those embodied in the questions I have received and answered for this issue, to step back and consider the history and the source. As with tracing words to their native meanings, finding the origin of technical issues can give you a deeper understanding of what you’re dealing with and how the situation evolved.
Good luck with your problem solving—a term that comes from the Latin words “problema,” meaning obstacle, and “solvere,” meaning to loosen or release.
Big Blocks and Compression
I have three tables in my test database, and each one has a different maximum block size. The first uses 254 sectors, reflecting my default data block size of 127KB. The second uses 1,024 sectors (512KB) and the third 2,048 sectors (1MB).
I load the data with the Teradata® Parallel Transporter Load utility with block-level compression on all three tables. After the load is complete, I record the average block size from the Ferret utility
SHOWBLOCKS. Next, I execute a batch job that performs a series of single-row inserts into the three tables, and then I look at how the average block size has changed.
I was surprised to see that the average size of the data blocks decreased for all three tables after I inserted rows, as shown here. (Editor’s note: the example is shown in the table.)
Since the post-loading average block size for each table was well below its maximum, I thought the data block size would have grown to reach the maximum size allowed and then split at that point. Can you explain why this didn’t happen?
—Block and Blue
When you are looking at Ferret
SHOWBLOCKS, you are seeing the average block sizes after compression has been applied. For the inserts in the batch job to be performed, the block must first be uncompressed. Then the new row can be inserted, after which the block is recompressed.
Because of how the data was loaded, the data block is already at its maximum allowable size in its uncompressed form. So when a single row is inserted into this uncompressed, maximum-sized data block, the block has to split. Then there are two smaller-sized uncompressed data blocks, each less than the maximum data block size. When these blocks undergo compression before being written out to disk, they each end up smaller than the previous compressed version of the single data block. This will contribute to the average post-compression data block size of the table being lower.
Cylinder Reads and Temperature Counts
What is the behavior of Teradata Virtual Storage in terms of the number of access counts it credits to a cylinder when a cylinder read takes place? Does it add up all the data blocks in the cylinder and increment the cylinder’s access counter by that many blocks? Or does the cylinder only get charged for one access?
Teradata Virtual Storage keeps a count of how many times a cylinder has been accessed in order to determine its relative temperature compared to other cylinders. Cylinders with hot data (information that is more frequently accessed) are moved to fast storage, while cylinders with cold data (information accessed less often) are moved to slow storage.
Starting with Teradata Database 14.10, both physical I/Os (reads and writes) and logical I/Os are used to determine the temperature of a cylinder. A cylinder read turns into one physical read (the cylinder) and “n” logical reads (one for each data block on the cylinder). Consequently, a cylinder read will be counted as n +1 I/Os for purposes of determining temperature.
This means that if a cylinder read is performed, compared to a straight-forward physical read of each data block in a cylinder, the cylinder gets charged one additional I/O.
Timeshare Decay and Workload Demotions
I’m using SLES11 on the Teradata Active Enterprise Data Warehouse 6700 platform. What happens if I have a query running in a workload in the Timeshare Top access level that has been decayed and the same query then hits a workload exception and is demoted to a workload in the Timeshare Medium level? Does the query stay in its decayed state, or does the decay threshold resource accumulation start over?
Decay is an option that automatically applies a priority-lowering mechanism to all Timeshare queries that have reached a specified threshold of resource usage. Once the threshold—which can be for CPU or I/O—is reached, the query’s access rate, which determines Timeshare query priority, will be cut in half. (See figure.) For example, a query running in Timeshare Top will have its access rate reduced from 8 to 4 if it consumes more than 10 seconds of CPU or 100MB of I/O resources on a given node.
Timeshare decay functionality and Teradata Active System Management workload demotions work together. Resources accumulated by a query will stay with the query when it is demoted. This is true whether the demotion is coming from a workload in the Tactical tier or the Service Level Goal (SLG) tier, or from another Timeshare access level. If the accumulated resource is larger than the Timeshare decay threshold at the time of the demotion, the query will be decayed immediately from within the Timeshare workload where it was just demoted. The CPU time and I/O bandwidth accumulated by a request are a piece of the session context that does not get modified when a change to a different workload takes place.
I am using Teradata Database
14.10 and am having difficulties with Database Query Log (DBQL)
USECOUNT logging. I checked
the DBS Control parameter ObjectUseCountCollectRate and it is set to the recommended 10 minutes. I issued the
BEGIN QUERY LOGGING command explicitly with
USECOUNT for the user who will be issuing queries. After running some queries and doing inserts for more than a 10-minute period of time, I still don’t see anything reflected in the DBC.ObjectUsage table.
USECOUNT logging keeps track of objects that are accessed within the database being logged, including the number of updates, deletes or inserts performed against the tables.
USECOUNT logging is performed on all the tables owned by a database, so your logging statement will be more productive if you specify Database, instead of User. Users can, but typically don’t, own tables or other objects. Contrary to how it might seem, if you enable
USECOUNT logging by User, it doesn’t cause counts to be recorded for every object the user touches. For example, if User John needs statistics and counts collected for Database Mktg,
USECOUNT would need to be enabled for Database Mktg, not User John.
My Teradata Developer Exchange blog post has a section on
USECOUNT logging and offers an example of the syntax to use that might help you.