Loading...
Analyze data and prevent problems with recursive queries.

Tech2Tech

Hands On

Traverse Hierarchies

Analyze data and prevent problems with recursive queries.

We find hierarchies in our everyday lives: bills of materials, organizational structures, genealogy charts, etc. In theory, hierarchies are easy to understand. However, they can be quite a challenge when you attempt to represent and manip­ulate them using a database and SQL.

The efficient handling of hierarchies can lead to the detection of anomalies in complex systems, the uncovering of unknown rela­tionships and much more. Unfortunately, in the real world, they are often plagued by data quality problems that can negatively affect their analysis. The Teradata Database can shield you from these problems through the use of recursive queries—as long as you take a few precautions.

Querying Hierarchies

The recursive query syntax was created to en­able SQL to query hierarchical data of an un­known depth. Using the WITH clause, you can define derived tables before the main query instead of within it. A recursive query contains at least one reference to its name within its own definition and is composed of these parts:

  • A seed query UNIONed to an iterative (recursive) segment
  • At least one logical condition to pre­vent infinite loops from occurring

Analyze Levels of Data

Hierarchies are best represented in the form of a treelike structure with several levels. Their in-database representation typically relies on parent-child relationships that, in the case of an organizational chart, identify the relationship between employer and employee. Due to the nature of hierachi­cal structures, data quality issues such as missing links or cyclic child-parent-child relationships can often complicate the analysis of data.

You can easily visualize how data quality problems occur by comparing figure 1 to figure 2. Notice how the link between "22" and "222" is missing in figure 2, perhaps due to input error. Also notice that "1," "11" and "111" can lock you into a cyclic loop that never ends.

Figure 1: Simple Hierarchy

Click to enlarge

Figure 2: Hierarchy With Problems

Click to enlarge
















Recursive queries can be used to analyze hierarchical data while preventing data quality problems. As an example, the recursive SQL syntax shown below has "safety features" that help identify cyclic data loops and keep the recursion depth to a maximum of 10 levels. This syntax can be repurposed to unravel any hierarchy that follows the format in table 1 below and to let you go much deeper than just 10 levels.

WITH RECURSIVE TREE
( LEVEL, PARENT_ID, CHILD_ID, PATH) AS
( SELECT 0 AS LEVEL,


                 E.PARENT_ID,
                 E.CHILD_ID,
                 CAST (E.PARENT_ID
                    AS VARCHAR(200)) AS PATH               
   FROM TABLE_1 E
   WHERE E.PARENT_ID = 0
   UNION ALL
   SELECT TREE.LEVEL +1 ,
                   S.PARENT_ID, S.CHILD_ID,
                   CAST (TREE.PATH || S.PARENT_ID
                      AS VARCHAR(200)) AS PATH
   FROM TREE, TABLE_1 S
   WHERE TREE.CHILD_ID=S.PARENT_ID
   AND TREE.LEVEL < 10 
   AND POSITION (S.PARENT_ID IN TREE.PATH) < 1          
)
   SELECT  PARENT_ID, CHILD_ID,
                  MIN(LEVEL)+1 AS DEPTH,
                  PATH||CHILD_ID AS PATH,
                  CASE
                  WHEN (POSITION(CHILD_ID IN PATH)>0)
                             THEN 'CYCLIC' 
                             ELSE''END AS CYCLIC
                 FROM TREE GROUP BY 1,2,4,5                                        
                           
ORDER BY 4,3,1,2;
Table 1: Parent-Child Rules

Click to enlarge

Some key elements in the syntax are:

  • A materialized PATH is composed of concatenated values that show the route taken by each level of recursion.
  • POSITION determines whether a node from the hierarchy is being repeated in the PATH, thus identifying cyclic data.
  • The TREE.LEVEL < 10 effectively stops the query from going any deeper than 10 levels.
  • The WHERE E.PARENT_ID = 0 clause in the SQL is responsible for defining node 0 as the starting point. Each step of all possible routes initiated at node 0 is shown following the PARENT-CHILD row order.

It’s not uncommon to be confused by the syntax of recursive queries, so it is useful to employ recursive VIEWS to simplify the analysis of hierarchies. Using the RECURSIVE VIEW format enables the use of SET functions such as MINUS and INTERSECT that allow comparative analysis between two simi­lar hierarchies.

By using REPLACE RECURSIVE VIEW, you can start to analyze changes or anomalies introduced by data quality issues. ANSI SQL recursive queries are also available to Teradata Database users and have been for many years.

This is what syntax with recursive VIEWS can look like.

REPLACE RECURSIVE VIEW TREE_V
( LEVEL, PARENT_ID, CHILD_ID, PATH) AS
( SELECT 0 AS LEVEL,
                 E.PARENT_ID,
                 E.CHILD_ID,
                 CAST (E.PARENT_ID
                    AS VARCHAR(200)) AS PATH               
   FROM TABLE_1 E
   WHERE E.PARENT_ID = 0
   UNION ALL
   SELECT TREE_V.LEVEL +1 ,
                   S.PARENT_ID, S.CHILD_ID,
                   CAST (TREE_V.PATH || S.PARENT_ID
                      AS VARCHAR(200)) AS PATH
   FROM  TREE_V, TABLE_1 S
   WHERE TREE_V.CHILD_ID=S.PARENT_ID
   AND TREE_V.LEVEL < 10 
   AND POSITION (S.PARENT_ID IN TREE_V.PATH) < 1          
);

Table 2: Paths From Node 0

Click to enlarge

If the appropriate row changes are made to table 1, it’s possible to have the table represent the hierarchy shown in figure 2. Executing the recursive SQL syntax against a revised table 1 will produce the results shown in table 2.

You can compare the hierarchy of nodes in figures 1 and 2 using this SQL.

SELECT PARENT_ID, CHILD_ID,        
                MIN(LEVEL)+1 AS DEPTH,

                PATH||CHILD_ID AS PATH,
                CASE
                WHEN (POSITION(CHILD_ID IN
PATH) > 0)
                THEN 'CYCLIC' ELSE''END AS CYCLIC
                FROMTREE_FIG2_V GROUP BY 1,2,4,5
MINUS
SELECT PARENT_ID, CHILD_ID,
                MIN(LEVEL)+1 AS DEPTH,

                PATH||CHILD_ID ASPATH,
                CASE
                WHEN (POSITION(CHILD_ID IN PATH) > 0)
                THEN 'CYCLIC' ELSE''END AS CYCLIC
                FROM TREE_FIG1_V GROUP BY 1,2,4,5
ORDER BY 4,3,1,2;

Note that the SQL before and after the MINUS is identical to the syntax used earlier in the recursive SQL syntax to extract, group and order the results of the recursive query. A simpler analysis of changes to a hierarchy can often be carried out without the need for recursive queries at all. For example, to find out which nodes in a hierarchy are considered to be top nodes, that is, nodes without parents, you only need to SELECT all PARENT_IDs MINUS the SELECT of all CHILD_IDs for the same hierarchy.

In order to obtain the results shown in table 3, it is being assumed that TREE_FIG2_V and TREE_FIG1_V correspond to recursive VIEWs in the format shown in the recursive VIEWs syntax where the underlying TABLEs contain the parent-child relationships that correspond to figures 2 and 1, respectively. Table 3 shows the differences between figures 1 and 2, as originally intended.

Table 3: Different Paths

Click to enlarge

Data Quality Protection

Hierarchies, though easy to traverse based on their parent-child rules, often come with data quality issues that can lead to infinite loops. Recursive queries can protect you from data problems such as missing links and cyclic loops.

The WITH RECURSIVE syntax can guide you through hierarchies to find their depth, detect infinite loops and more. You’ll be surprised how easy it is to use SQL to navigate the structures and identify data quality problems.


Your Comment:
  
Your Rating:

Comments
 
I'm not sure who to submit this to. Help, please. The following that pertains to Teradata is part of a letter we are sending out to gather support for an Occupy Dental Office protest in Las Vegas. ...We further clarify the cause of the Rush Limbaugh meanness in today's America with the story on our website of a sadistic fraternity brother I knew (now a senior manager at Teradata) who got down on his knees to suck for his position when he was young and never got off them. Michael is your typical power mad corporate conservative, closet homosexual bully who gives a clear picture of what Newt Gingrich and his ilk are all about. You have to have known one of these jerks on the right close up to make sense of their vicious idiocy. The key to these double-talking, ever sneaky robots is their inability to get excited about anything but the cruelty they impose on others that partially relieves their humiliation in life from having had to submit to get power in the slave colony America horror movie all of us a

12/10/2011 12:43:39 PM
— Anonymous