Tech2Tech

Applied Solutions

Rows vs. Columns: Why Not Both?

Teradata Columnar allows row, column or hybrid reads for ultra-efficient queries.

Teradata Columnar, introduced in Teradata Database 14.0, offers a new approach for organizing the data of user-defined tables and join indexes on disk. Teradata’s unique implementation of columnar has the ability to partition a table or join index by column. This column partitioning feature can be used alone in a single-level partitioning definition or with row partitioning, enabling queries to access data by column, row or both.

The hybrid row and column feature provides a new paradigm for storing data that changes the cost-benefit trade-offs of physical database design choices and their combinations. Teradata Columnar also benefits businesses by reducing I/O for certain queries while simultaneously decreasing space usage.

Column-Partitioned Characteristics

A table or join index that is partitioned by column has several key features:

  • No primary index
  • Column partitions composed of single or multiple columns
  • Column partitions, each of which usually contains multiple physical rows
  • Columnar storage and autocompression implemented with a new physical row format column called a container
  • Row-storage implementation that may have physical rows with row format called a subrow

Divide and Conquer

Row partitioning in the Teradata Database allows users to group rows horizontally. Each row partition contains a subset of the total rows in a table, such as one day of transaction data. Users can define one or more levels and define a partitioning expression for each level. If they have a query that specifies a value or a range of values for the partitioning column, fewer rows need to be accessed, compared with a full table scan.

With column partitioning, each column or groups of columns in a table will become a partition containing the values of that specific grouping. This is a simple approach because there is no need to define partitioning expressions, and determining partition elimination is very easy. Here is an example that creates a column-partitioned table that vertically partitions the data:

CREATE TABLE Sales_CP (
    TxnNo INTEGER,
    TxnDate DATE,
    ItemNo INTEGER,
    Quantity INTEGER )
  PARTITION BY COLUMN,
  UNIQUE INDEX (TxnNo);

A primary index is not specified, so this is a no primary index (NoPI) table. Each column is its own partition, so every partition value is just a value of that individual column.

Efficiencies of Column Partitioning

The key benefit in defining row partitioning for a table is when queries access a subset of rows based on constraints on the partitioning columns. Using column partitioning improves the performance of queries accessing a subset of the columns, either for predicates or projections. Because sets of columns can be stored in separate partitions, only those containing the columns referenced by the query need to be scanned.

The advantages of column and row partitioning can be combined in Teradata Columnar, further reducing I/O. Fewer data blocks need to be read, since more information is packed into the blocks. For example, the following shows how to get a list of items sold on May 29, 2011:

CREATE TABLE Sales_CPRP (
    TxnNo INTEGER,
    TxnDate DATE,
    ItemNo INTEGER,
    Quantity INTEGER )
  PARTITION BY (
     COLUMN,
     RANGE_N(TxnDate
       BETWEEN DATE ‘2011-01-01’
         AND DATE ‘2011-12-31’
        EACH INTERVAL ‘1’ DAY) ),
  UNIQUE INDEX (TxnNo);
Figure 1: Column and Row Partitioning

Click to enlarge

With both column and row partitioning defined on a table, the query needs to access only the column partitions containing essential items. (See figure 1.)

Another way to look at the advantages of partitioning is to contrast the data accessed when different types of partitioning are defined. For example, this table definition has various physical database designs:

CREATE TABLE mytable (
 A INT, B INT, C CHAR(100),D INT, E INT,
 F INT, G INT, H INT, I INT, J INT, K INT,
 L INT );

The query below is based on the previous table called mytable:

SELECT SUM(F) FROM mytable WHERE
B BETWEEN 4 AND 7;

In figure 2, only columns F and B are referenced by the query, even though the table has 12 columns. The figure illustrates the data that has to be accessed when there is:

1} no partitioning

2} row partitioning on column B

3} column partitioning

4} both column and row partitioning

If the table has 4 million rows, the query reads approximately this many data blocks:

  • 9,987 with no partitioning
  • 4,529 with row partitioning
  • 281 with column partitioning
  • 171 with column and row partitioning
Figure 1: Column and Row Partitioning

Click to enlarge

Decreased I/O usually comes with higher CPU usage. Since I/O is often relatively expensive compared with CPU, and CPU is getting faster at a much higher rate than I/O, this is usually a reasonable trade-off.

Fewer Touch Points

Teradata Columnar offers the ability to partition a table or join index vertically by column, horizontally by row or a hybrid of both. To eliminate the need to access all data in all rows and columns, Teradata Columnar breaks the table into smaller sections so only the individual column data for the proper rows is scanned to complete a query.


Your Comment:
  
Your Rating:

Comments
 
Very Good .....

6/20/2014 3:32:59 AM
— Anonymous
 
Thank you for such a beautifully explained article.

6/11/2014 2:38:49 AM
— Anonymous
 


5/15/2014 1:35:57 AM
— Anonymous
 
Very Nice Article!!

4/2/2014 3:34:21 AM
— Anonymous
Datawatch Q4-2014