Loading...

Tech2Tech

Applied Solutions 4

Another option on the table

The ANSI merge feature loads and transforms data without rollback.

An effective data load strategy is important for successful data warehousing. While many DBAs use MultiLoad, the ANSI merge feature may take the load strategy into a new era. That’s because unlike MultiLoad, it works with features like:

image

Click to enlarge

  • Unique secondary indexes (USIs)
  • Triggers
  • Referential integrity
  • Join indexes

The basics

Merge is ANSI-standard SQL syntax that can perform bulk operations on tables using the extract, load and transform (ELT) function. (See figure.) These opera­tions merge data from one source table into a target table for performing massive inserts, update and upserts. So why use the merge function when an insert-select will work, or when an update join will suffice? The answer is simple: better performance and the added functionality of executing a bulk, SQL-based upsert.

Beginning with Teradata 13, the FastLoad target table can be a “No-PI” table. This type of table will load data faster because it avoids the redistribution and sorting steps, but this only postpones what eventually must be done during the merge process. The “merge target table” can have a predefined error table assigned to it for trapping certain kinds of failures during the merge process.

Consider the following ANSI merge syntax. Although a merge for single-row processing can be written, this example will focus only on bulk operations to illustrate the basic syntax structure for a merge:

Example 1:
MERGE INTO Department
USING NewDept AS ND
ON nd.Department_Number = Department.
Department_Number
WHEN MATCHED THEN UPDATE
SET budget_amount = nd.Budget_Amount
WHEN NOT MATCHED THEN INSERT
VALUES
(nd.Department_Number, nd.Department_
Name, nd.Budget_Amount,
nd.Manager_Employee_Number);

In this SQL example, these rules apply:

  • The VALUES clause implies that the structure of the source table (NewDept) does not need to match the structure of the target table (Department).
  • The ON condition must reference the PI of the target table.
  • The PI value in the insert must match the ON value.
  • The WHEN MATCHED and WHEN NOT MATCHED fields imply an upsert is to be performed.

The following two merge statements have the same goal: to insert rows from a source table into a target table. In the first example, only inserts rows are expected, as might usually be done through an insert-select. Therefore, no PI matches are expected:

Example 2: 
MERGE INTO Orders2
USING Orders3
ON ORDERS3.Order_Number = Orders2.
Order_Number
WHEN NOT MATCHED THEN INSERT
Orders3.order_number, Orders3.
invoice_number,
Orders3.customer_number, Orders3.
initial_order_date,
Orders3.invoice_date, Orders3.
invoice_amount);

The next merge statement illustrates an interesting notion related to SQL. Recall from the merge rules listed earlier that the PI of the target table must be referenced in the ON value. In the previous example, duplicate rows are not a consideration because the con­dition ON ND.DEPARTMENT_NUMBER = DEPARTMENT.DEPARTMENT_NUMBER and the WHEN NOT MATCHED clause combine to process inserts. In this next example, the extra condition added to the ON clause will always cause the WHEN MATCHED condition to evaluate false, so it will not be checked:

Example 3: 
MERGE INTO Orders2
USING Orders3
ON ORDERS3.Order_Number = Orders2.
Order_Number AND 1=0
WHEN NOT MATCHED THEN INSERT
(Orders3.order_number, Orders3.invoice_number,
Orders3.customer_number, Orders3.
initial_order_date,
Orders3.invoice_date, Orders3.
invoice_amount);

For this syntax, the following condi­tions apply:

  • If only inserting is needed, the form shown above will never match because the 1=0 value will always be false.
  • If the 1=0 value is always false, then only inserts will be attempted for each source row.
  • Because duplicate rows for SET tables do not fail the request, they are silently discarded with this merge, as they are with any insert-select operation.

Regarding merge performance, if USI exists on the target table, the second merge example will outperform the insert select function. However, if USI does not exist on the target table, then the insert select func­tion could be slightly faster.

Explain comparisons

In comparing the explains of the two syn­taxes, note that the join condition is further qualified in the second syntax while it is not in the first. As an unsatisfiable condition (1=0), all rows from the sourcing tables in the second merge will be attempted as inserts.

Explain of the merge step for Example 1:
4) We do an all-AMPs merge with unmatched inserts into DLM.Orders2 from DLM.Orders3
with a condition of (“DLM.Orders2.order_number = DLM.Orders3.order_number”).
The number of rows merged is estimated with high confidence to be 2,143,100 rows.
Explain of the merge step for Example 2:
4) We do an all-AMPs merge with unmatched inserts into DLM.Orders2 from DLM.Orders3
with a condition of (“(1=0)”). The number of rows merged is estimated with high
confidence to be 2,143,100 rows.

Rollbacks

When an operation such as an insert-select fails, the transient journal is used by the database to undo, or rollback, all of the changes that have been made during the operation. The target table returns to its original state prior to the insert-select, rolling back entire transactions. Once the error is corrected, these transactions must be repeated, if desired.

—D.M.

Update-only merge

In yet another form of merge, the syntax for the update-only merge is just a variation of the insert-only syntax. In the following example, the WHEN NOT MATCHED THEN INSERT clause is removed and only the WHEN MATCHED THEN UPDATE clause is referenced. On the target table, an error table is created using the following syntax to trap failures that would normally abort the merge process:

    CREATE ERROR TABLE NewDeptErr FOR DLM.Department;

While the syntax that references the error table is optional, keep in mind that without the error table reference, any failures would abort the merge and all updates performed would be rolled back. With the error table enabled, the failures might get logged, allowing the merge to continue:

MERGE INTO Department 
USING NewDept AS ND
ON nd.Department_Number = Department.
Department_Number
WHEN MATCHED THEN UPDATE
SET budget_amount = nd.Budget_Amount
LOGGING ALL ERRORS WITH NO LIMIT;

In this merge syntax, note that:

  • The merge update can perform signifi­cantly better than its equivalent SQL update
  • This performance advantage grows even greater when USI maintenance is involved
  • Whereas duplicate rows are silently discarded for any insert-select into a SET table, they are not discarded for bulk updatesThe next syntax is a more complex example that might provide some insights into the potential of merging data. Note that the italicized subquery builds the USING list that will merge NewDept rows into Department rows but only for those departments in NewDept that have employees. The use of a subquery makes it possible to merge data more conditionally:
MERGE INTO Department 
USING
 (SELECT Department_Number,
department_name,
        Budget_Amount,
Manager_Employee_Number
    FROM NewDept
    WHERE Department_Number IN
(SELECT Department_Number
        FROM Employee)) AS m
ON m.Department_Number = Department.
Department_Number
WHEN MATCHED THEN UPDATE
SET budget_amount = m.Budget_Amount
WHEN NOT MATCHED THEN INSERT
(m.Department_Number, m.Department_
Name, m.Budget_Amount,
m.Manager_Employee_Number)
LOGGING ALL ERRORS WITH NO LIMIT;

Error tables

An error table can be used to trap many failures that would normally cause a rollback and, con­sequently, any changes made during the operation are returned to their original state. By logging the failed event in error tables, requests can fin­ish without a rollback. However, not all failures are trapped or even logged, and certain failures will abort and cause a roll back. Even with these few exceptions in which not all failures are logged, error tables might be useful tools in a growing arsenal of SQL aimed at improving ELT strategies for active data warehousing.

Reference manuals will provide a complete discussion on how error tables work and how to use them.

—D.M.

Avoid rollbacks

The Teradata Database 12 ANSI merge fea­ture provides a useful method for actively loading and transforming data from a staging table using a SQL-based strategy. This method can be enhanced through the establishment of error tables to avoid rollback or by defining a Teradata 13 No-PI table as the staging table.

While implementing the ANSI merge feature will produce the same results as any other merge technique, the results will be faster and more explicit.


Your Comment:
  
Your Rating:

Comments
 


4/27/2011 5:06:35 PM
— Anonymous
 
If the (select insert) or merge is from a view that contains a cast to the correct data type how do you avoid SQLException: ERROR [22003] [Teradata][ODBC Teradata Driver][Teradata Database] Bad character in format or data of table errors?

8/13/2010 4:06:59 PM
— Anonymous