Loading...

Tech2Tech

Applied Solutions 3

Inside, not out

SAS Data Integration processes data faster inside the Teradata Database

SAS Data Integration (DI), an enterprise solution for data integration and data quality, has new and enhanced capabilities for managing data inside the Teradata Database. With SAS DI, users can leverage a single, flexible tool to collect data from a variety of sources, including relational databases, flat files and SAS data sets, and move that data into and out of the Teradata Database. It provides familiar constructs for SAS users so they can either develop their DI processes via graphically defined DI workflows or directly manipulate SAS code.

Additionally, by developing workflows using the extract, load and transform (ELT) technique, users can maintain the level of control available in SAS while leveraging the performance and scalability of the Teradata Database to transform the data using Teradata SQL. The ELT approach minimizes data movement and leverages parallel processing for data transformation. (See figure 1.) Using SAS DI and ELT can save days of processing time for large data volumes.

image

Click to enlarge

Working inline with the Teradata system, the speed, flexibility and power of SAS DI make it especially ideal for business analytics data preparation. It also supports enterprise data warehousing initiatives, such as master data management, data migration, data quality and data governance initiatives.

Users can employ six key capabilities of SAS DI to leverage the combined powers of the Teradata Database and SAS for faster data integration.

1. First, Consolidate

A best practice is to consolidate data from source systems and load it up front into the Teradata Database. SAS DI contains a library of standard transforms, including one for extract. The extract transform allows you to select tables, columns and rows to extract from various sources. Among others, the sources can be any database management system, file format or SAS data set. The result is a view that simplifies future load and transform steps and supports future in-database calculations of these root data elements.

2. Use Teradata Loaders

SAS provides Teradata Table Loader to load data from any data source SAS can access into the Teradata Database using new Teradata Parallel Transporter technology to achieve high performance loads. This technology offers many benefits. For example, if a load is interrupted, Teradata Parallel Transporter allows the user to restart it at the record level.

Teradata Table Loader also supports other utilities, including MultiLoad, FastLoad and TPump. To effectively apply Teradata Table Loader, the user should consider the appropriate technique for the data being loaded, e.g., FastLoad for a new, empty table; MultiLoad for adding data to already populated tables; and TPump for continuously moving data into the tables. The user can also direct SAS DI to choose the appropriate method at load time.

3. Push processing down

Once the data is loaded, SAS DI can leverage SQL transformation logic that runs inside the Teradata Database. SAS DI has two key in-database transforms: extract and SQL join, a multi-purpose transformation that supports queries, filters, extracts, aggregations, joins and more. Both transforms leverage SQL Passthru to produce the SQL that is executed in the Teradata Database. The SQL Passthru option instructs SAS DI to request that SAS submit the SQL to the database for processing. When this option is set, SAS will not process the SQL.

The SAS interface known as expression builder can be used to define arguments for a SQL-select statement and Teradata-centric SQL transformations, such as data mappings and aggregations. The user can point and click to build the expression or enter one directly into the expression builder, with the option to include native functions from the Teradata folder. SAS makes it easy to employ these functions by providing links to the Teradata function reference documentation.

image

Click to enlarge

4. Redirect SAS temporary views/tables

By default, the SAS DI extract transform creates its output views and tables in a temporary space referred to as SASWORK. This SAS library is stored outside the Teradata Database. However, when applying an ELT approach, the user will want to instruct SAS DI to redirect these temporary tables and views to the Teradata Database. A user can set this individually at the table level by choosing an alternate library or globally by selecting “Alternative Library for Temporary Tables” in SAS DI under the Tools, Options menu.

5. Visualize in-database workflows

SAS offers the flexibility to execute SQL within both the SAS engine and the Teradata Database. When designing ELT processes, users can click on the “Check Database Processing” option to visualize what will run in the database. SAS DI will analyze whether any SAS-specific functions or formats are being used, and will mark transforms with a letter “T” if they are ready to run completely inside the Teradata Database. This helps users design ELT processes that will run completely in the database.

6. Embed analytics in the ELT process

SAS users often produce valuable insight leveraging advanced analytic techniques. They can embed these analytics inside the ELT process and enrich their data, making it more useful for the organization.

Another SAS product, SAS Scoring Accelerator for Teradata, allows SAS scoring algorithms to be published inside the Teradata Database as a user-defined function. Integrated with SAS DI, users might call these SAS capabilities inside the Teradata Database to dynamically enrich customer profitability data with new predictions from SAS models. Also, any user defined SAS formats that are published to the Teradata Database can be used as well.

Optimized data integration with SAS and Teradata

SAS DI allows users to leverage the power and flexibility of the SAS platform and take advantage of the database capabilities of the Teradata platform. The product provides an easy way to manage complex, multi-step workflows to prepare analytic data, mixing the power of SAS with the speed of the Teradata system. Using ELT-style techniques enables SAS DI users to quickly develop optimal data preparation workflows, minimize data movement, reduce overall storage requirements and significantly improve overall data integration performance.


Your Comment:
  
Your Rating:

Comments
 


12/12/2011 4:42:04 AM
— Anonymous