Hands On

Reveal Relationships

Teradata Aster graph discovery highlights complex networks to show hidden connections.

Teradata® Aster Discovery Platform 6 is sparking a sharp interest in graph discovery and the unique value it can deliver to businesses. Graph reporting, a precursor to graph discovery, that helps find basic connections such as an individual’s friends in a social network, has been available for years but has limited analytic value.

The graph analytics capabilities enabled by Teradata Aster Discovery Platform 6 are different. The discovery platform uses algorithms that process an entire graph to extract deep insights, such as identifying the biggest influencers in a social network, and provide answers to complicated analytics problems in a timely manner—something graph reporting databases can’t do well.

Graphs and Graph Discovery

A graph is composed of two parts: objects called vertices, which represent entities such as customers, machines, companies or products; and edges, which are objects that connect the vertices and denote interdependencies or interactions between entities. Edges can be emails, calls or purchases. Graph structures can be used to represent large groups, such as social, fraud or communication networks.

Graph discovery refers to the application of mathematical algorithms to graph structures. These algorithms can compute structural or statistical metrics to help identify entities that play key roles in the network represented by the graph. The metrics output helps organizations solve intricate business challenges, such as identifying influential employees for retention, detecting fraud in an online community, and determining product affinities and recommendations by exploiting community buying patterns.

Augmenting Content With Context

Graph discovery can enable dramatic improvements in decision making. This is because it allows context-based decision models, which take into account interdependencies between entities, to be combined with content-based decision models, which treat an entity as a discrete unit of analysis.

For example, a telecommunications company might seek to target influential customers with high positive sentiment for new product offers. These individuals would be more likely to accept the offer and influence others to inquire about the product. Influencers can be identified by applying graph analysis to the call network, which is derived from call detail records. The customers with high positive sentiment scores can be found through analysis of text logs kept by the call center. The targeted customers are identified by joining the results of these two analyses and finding those who fit the profile. (See figure)

6 Step Process

The Teradata® Aster SQL-GR™ analytic engine process follows these steps:

  1. Read data from relational tables, files and remote sources
  2. Initialize the graph structure
  3. Execute iteration by all verticals
  4. Pass messages and update aggregators
  5. Finish or prepare for the next iteration
  6. Output the data

Next-Generation Processing Engine

Teradata Aster Discovery Platform 6 features a native processing engine for graph analysis across big data sets. Teradata Aster SQL-GR™ is a next-generation analytic engine that performs parallel analysis of massive graph structures on clusters of commodity class servers.

SQL-GR integrates tightly with SQL and SQL-MapReduce® engines. It can be invoked through a single SQL interface, empowering analysts and data scientists to discover high-impact insights that derive and combine context- and content-based decision models in a single expression—a capability unique to the Teradata Aster Discovery Platform. The sample query below shows how to identify influential customers.

In this example, the PageRank function in the first subselect performs an influencer analysis over a customer call network derived from call detail records. The call network is represented in the two tables cust_table and cdr_table. Each row in the former is treated as a vertex (customer) and each row in the latter represents an edge (a call from one customer to another). The output of subselect with the PageRank graph function is equal to the cust_table with an extra column indicating the page rank (influencer score) of the customer. The probabilistic algorithm runs iteratively until it converges.

The EXTRACTSENTIMENT in the second subselect computes a sentiment score for each customer based on analysis of text logs from the call center. These text logs are stored in Aster File Store, which is new for Teradata Aster Discovery Platform 6, and entered to the EXTRACTSENTIMENT function via the TABLE_FROM_AFS that is used to access Aster File System (AFS) data. An AFS is a location-independent file system that uses a local cache to reduce the workload and increase the performance of a distributed computing environment.

The subselect has the sentiment score for each customer based on analysis of these call center logs. The results of the graph and sentiment analysis subselects are joined via SQL and ordered according to highest PageRank score.

Find the Good and the Bad

Teradata® Aster Discovery Platform 6 can be used to solve a range of business problems. For instance, it can identify organizations, individuals and machines suspected of being involved in fraud by looking at patterns of interactions. An iterative algorithm called “loopy belief propagation” can be applied to a graphical model, which represents conditional dependence among random variables using a graph structure.

The algorithm requires an indefinite number of steps until convergence. It performs a statistical inference on the graphical model to determine a good/bad probability of fraud for each entity, based on its relationship networks and intensity of interactions such as invoices and emails. In other words, probability is based on the idea that someone is guilty by associating with a known fraudulent person or company.

Examining networks also enables organizations to improve online recommendations by putting digital communities into context. A graph algorithm called “Personalized Salsa,” which is a randomized, iterative algorithm that can determine a “circle of trust” for each user in its social graph, might be applied here in order to determine a “circle of similarity” for products. The algorithm is applied to a bi-partite graph derived from product purchases wherein customers and products are represented as vertices, and edges link customer to products via purchases.

The analysis considers community-buying effects, which is a better approach than algorithms that require items to be purchased together. This lets companies identify online targets for new products or upselling.

Graph-Parallel Analysis

Graphs can sometimes get very large, so graph discovery must be able to partition the graph, distribute the computation and analyze components in parallel. In contrast to data-parallel processing architectures, which parallelize an analysis using independent subtasks, graph-parallel processing architectures cannot partition a graph without cutting edges and therefore must provide for parallel subtask communication. And unlike data-parallel algorithms, which can be executed using a finite and predetermined number of data flow steps, graph algorithms are iterative, often requiring an unbounded number of iterations until global convergence is reached.

Teradata Aster Discovery Platform 6 tackles these issues using a graph-parallel execution capability based on a general-purpose bulk synchronous parallel (BSP) framework. The BSP framework is fronted by a graph engine that can manage and drive the analysis of graph structures larger than the available physical memory. This makes the technology potentially more advantageous from a price and performance perspective.

A new graph analysis package, provided with the Teradata Aster Analytic Foundation, contains out-of-the-box functions adapted for graph-parallel analysis. These functions enable end users to perform large-scale graph analysis from the comfort of SQL. Developers who wish to extend these capabilities can write their own graph functions using the intuitive vertex-oriented API exposed by the graph engine. An Eclipse-based SDK provides a user-friendly environment for composing and testing user-defined graph functions.

Enormous Business Potential

Teradata Aster Discovery Platform 6 is the first commercial product to offer support for true, large-scale graph analytics. The solution offers pre-built functions that can be customized based on the type of analysis being performed. The discovery process combines context- and content-based decision models to generate better predictions. These features allow organizations to identify complex relationships for promoting products, improving sales, reducing fraud and other business benefits.

The question is not whether graph analytics can impact your business. It’s whether you will achieve that impact before your competitors do. Using graph analytics to increase revenues, improve customer loyalty and reduce waste will help achieve outstanding business performance. The wealth of information and insight currently hidden in graphs is enormous.

Identify Influential Customers

//Find top 10 most influential customers with high positive 
SELECT customerid, review, out_polarity, (.60 * normalized_
    sentiment +  .40 *pagerank) as score 
		ON cust_table AS Vertices   PARTITION BY 
		ON (SELECT count(*) FROM  cust_table) AS 
		“TotalNodesNum” DIMENSION
	    THRESHOLD(‘1E-8’) ) ) x, 
	(SELECT *, opinion_sum::double/word_count as normalized_
				    customers path(‘/
				        mapreduce.lib. input.
				OUTPUTS (‘customerid 
				        varchar’, ‘review 
			LOCALITY(‘roundrobin’ )
sentiment_lexicon.txt’) )) y
WHERE x.node=y.customerid

David Simmen is an Engineering Fellow and chief architect for Teradata Aster’s nCluster platform. He has more than 20 years of experience in enterprise information management systems.

Your Comment:
Your Rating:

Fuzzy Logix