Digital Landscape

Tech2Tech

Hands On

Reach Deep Into History

Leverage deep history data across databases and data warehouses with Teradata QueryGrid to uncover new business value.

While Teradata® QueryGrid™ allows you to access data and enables processing across heterogeneous systems, including technologies from Apache™ Hadoop®, Teradata Aster, Oracle or even MongoDB, some of the most compelling uses involve multiple Teradata Database systems. When organizations use several Teradata platforms for various purposes, having direct SQL access across them, along with the ability to orchestrate processing between them, opens up new possibilities.

Gain New Insights From Historical Data

Increasingly stringent regulations require companies to keep data online and accessible for regulatory compliance over several years or more. Although the most frequently accessed data is the latest or most current data, that doesn’t mean that the older information is not useful or relevant. Data that’s been compiled over several years gives a rich perspective of the business, such as long-term trends and cyclical patterns.

Because there is typically much more history data than current data, and the concurrency and usage of historical information is substantially less, it makes sense to store it on a separate system that has different performance and price characteristics: for example, on a Teradata 1000 or 2000 series warehouse.

However, keeping historical and current data on separate systems has made it a challenge to gain unique insights that are possible only by analyzing the information together. But not any longer. Now, Teradata QueryGrid can be used to seamlessly join together all the historical and current information across multiple Teradata systems, without having to change the basic data structures and queries. This makes it possible to answer questions that could not be previously addressed so decision makers can better plan for the future.

The business can generate a basic report that is run on the past year’s data stored on the integrated data warehouse (IDW). The results are shown in the table below:

Hands on table
SELECT sales_date, SUM(sales_quantity) AS total_sales
FROM samples.sales_fact
GROUP BY 1
ORDER BY 1;

The query results returned 334 rows in 1.5 seconds. Now, if the business wants to run a full report on all of its data, including the historical data, Teradata QueryGrid can query information from all available years across the IDW and the historical data located on another Teradata system. The data on the historical database has the exact same column structure, but it is in a table called: sales_fact_history.

By using a simple UNION to join the data across the systems, the Teradata QueryGrid foreign server object we have created in this example is called td1000:

SELECT sales_date, SUM(sales_quantity) AS total_sales
FROM (
	SELECT * FROM samples.sales_fact
		UNION ALL
	SELECT * FROM samples.sales_fact_history@td1000) all_sales
GROUP BY 1
ORDER BY 1;

The query plan, which executes using Teradata QueryGrid, followed these steps:

  • The query was initiated from the IDW.
  • The local query on the IDW ran to select qualifying rows.
  • A remote query on the td1000 ran to select qualifying rows.
  • All rows were returned from the td1000 and placed in a spool on the IDW.
  • The IDW merged both data sets and applied aggregation to all rows.
  • The IDW applied grouping and ordering.

This Teradata QueryGrid query resulted in 1,336 rows, and 14 million rows were transferred back to the IDW. The query took about 30 seconds to complete.

Optimize With Push-Down Processing

Just as you can optimize a query on a single system to perform better, you can also optimize Teradata QueryGrid queries. You need to take into consideration the performance of the individual query pieces that will happen on each system as well as the performance of the network between them.

One of the most powerful features of Teradata QueryGrid is its ability to orchestrate processing across multiple systems and “push down” the processing when desired. This revised sales report query leverages that capability:

SELECT sales_date, SUM(sales_quantity) AS total_sales
FROM samples.sales_fact
	GROUP BY 1
UNION ALL
SELECT *
FROM FOREIGN TABLE (
	SELECT sales_date, SUM(sales_quantity) AS total_sales
	FROM samples.sales_fact_history
	GROUP BY 1)@td1000 old_sales
ORDER BY 1;

Economically Scale Your Database Environment

Teradata® QueryGrid™ for Teradata-to-Teradata is a high-performance parallel connector used between multiple Teradata Databases. It allows you to economically scale your database environment by adding platforms with different characteristics and costs to match the usage and business value of your data. The solution even provides an integrated EXPLAIN plan that allows you to easily see exactly what will happen on each system before you run your query, making it much easier to optimize your multi-system queries.

To utilize Teradata QueryGrid for push-down processing, you use the keywords “FOREIGN TABLE.” This lets you initiate a subquery on the secondary system, which is everything shown in the parentheses in the preceding query.

In this case, the 1000 series system aggregates the results for its data and sends just the results instead of all the raw data rows. This allows you to minimize the data transferred across the network as well as use the processing power of that system.

The query plan for this push-down query using Teradata QueryGrid followed these steps:

  • The query was initiated from the IDW.
  • A local query on the IDW ran to select qualifying rows: sales_quantity aggregated.
  • A remote query on the td1000 ran to select qualifying rows: sales_quantity aggregated.
  • Qualifying rows were returned from the td1000 and placed in spool on the IDW.
  • The IDW merged both data sets.
  • The IDW applied ordering.

The push-down version of this Teradata QueryGrid query resulted in 1,336 rows, with just 1,002 rows transferred. The total elapsed time was about four seconds. As the results demonstrate, there is a dramatic increase in performance when using the push-down capabilities of Teradata QueryGrid. But, as with any optimization, results will depend on your particular environment, such as your systems, data and network. You may also want to use push-down processing to leverage idle resources in order to free up capacity on the IDW, even if the overall performance is slower.

Uncover More Value

Using the push-down capabilities of Teradata QueryGrid lets you orchestrate queries to fit your business needs and data architecture. The solution enables seamless, high-performance, multi-system analytics while supporting many different platforms.

Leveraging a company’s deep historical data to uncover new answers to business problems is just one of the ways you can use Teradata QueryGrid across multiple database systems. As more businesses continue to adopt the solution, they will find more ways it can help them uncover insights and get even more value from all their data.

Andy Sanderson is the product marketing manager for many of the Teradata® Unified Data Architecture™ products, including Teradata QueryGrid™.


Your Comment:
  
Your Rating:

Comments
Fuzzy Logix