Teradata Aster graph discovery highlights complex networks to show hidden connections.
Teradata® Aster Discovery Platform 6 is sparking a sharp interest in graph discovery and the unique value it can deliver to businesses. Graph reporting, a precursor to graph discovery, that helps find basic connections such as an individual’s friends in a social network, has been available for years but has limited analytic value.
The graph analytics capabilities enabled by Teradata Aster Discovery Platform 6 are different. The discovery platform uses algorithms that process an entire graph to extract deep insights, such as identifying the biggest influencers in a social network, and provide answers to complicated analytics problems in a timely manner—something graph reporting databases can’t do well.
Graphs and Graph Discovery
A graph is composed of two parts: objects called vertices, which represent entities such as customers, machines, companies or products; and edges, which are objects that connect the vertices and denote interdependencies or interactions between entities. Edges can be emails, calls or purchases. Graph structures can be used to represent large groups, such as social, fraud or communication networks.
Graph discovery refers to the application of mathematical algorithms to graph structures. These algorithms can compute structural or statistical metrics to help identify entities that play key roles in the network represented by the graph. The metrics output helps organizations solve intricate business challenges, such as identifying influential employees for retention, detecting fraud in an online community, and determining product affinities and recommendations by exploiting community buying patterns.
Augmenting Content With Context
Graph discovery can enable dramatic improvements in decision making. This is because it allows context-based decision models, which take into account interdependencies between entities, to be combined with content-based decision models, which treat an entity as a discrete unit of analysis.
For example, a telecommunications company might seek to target influential customers with high positive sentiment for new product offers. These individuals would be more likely to accept the offer and influence others to inquire about the product. Influencers can be identified by applying graph analysis to the call network, which is derived from call detail records. The customers with high positive sentiment scores can be found through analysis of text logs kept by the call center. The targeted customers are identified by joining the results of these two analyses and finding those who fit the profile. (See figure)
Next-Generation Processing Engine
Teradata Aster Discovery Platform 6 features a native processing engine for graph analysis across big data sets. Teradata Aster SQL-GR™ is a next-generation analytic engine that performs parallel analysis of massive graph structures on clusters of commodity class servers.
SQL-GR integrates tightly with SQL and SQL-MapReduce® engines. It can be invoked through a single SQL interface, empowering analysts and data scientists to discover high-impact insights that derive and combine context- and content-based decision models in a single expression—a capability unique to the Teradata Aster Discovery Platform. The sample query below shows how to identify influential customers.
In this example, the PageRank function in the first subselect performs an influencer analysis over a customer call network derived from call detail records. The call network is represented in the two tables
cdr_table. Each row in the former is treated as a vertex (customer) and each row in the latter represents an edge (a call from one customer to another). The output of subselect with the PageRank graph function is equal to the
cust_table with an extra column indicating the page rank (influencer score) of the customer. The probabilistic algorithm runs iteratively until it converges.
EXTRACTSENTIMENT in the second subselect computes a sentiment score for each customer based on analysis of text logs from the call center. These text logs are stored in Aster File Store, which is new for Teradata Aster Discovery Platform 6, and entered to the
EXTRACTSENTIMENT function via the
TABLE_FROM_AFS that is used to access Aster File System (AFS) data. An AFS is a location-independent file system that uses a local cache to reduce the workload and increase the performance of a distributed computing environment.
The subselect has the sentiment score for each customer based on analysis of these call center logs. The results of the graph and sentiment analysis subselects are joined via SQL and ordered according to highest PageRank score.
Graphs can sometimes get very large, so graph discovery must be able to partition the graph, distribute the computation and analyze components in parallel. In contrast to data-parallel processing architectures, which parallelize an analysis using independent subtasks, graph-parallel processing architectures cannot partition a graph without cutting edges and therefore must provide for parallel subtask communication. And unlike data-parallel algorithms, which can be executed using a finite and predetermined number of data flow steps, graph algorithms are iterative, often requiring an unbounded number of iterations until global convergence is reached.
Teradata Aster Discovery Platform 6 tackles these issues using a graph-parallel execution capability based on a general-purpose bulk synchronous parallel (BSP) framework. The BSP framework is fronted by a graph engine that can manage and drive the analysis of graph structures larger than the available physical memory. This makes the technology potentially more advantageous from a price and performance perspective.
A new graph analysis package, provided with the Teradata Aster Analytic Foundation, contains out-of-the-box functions adapted for graph-parallel analysis. These functions enable end users to perform large-scale graph analysis from the comfort of SQL. Developers who wish to extend these capabilities can write their own graph functions using the intuitive vertex-oriented API exposed by the graph engine. An Eclipse-based SDK provides a user-friendly environment for composing and testing user-defined graph functions.
Teradata Aster Discovery Platform 6 is the first commercial product to offer support for true, large-scale graph analytics. The solution offers pre-built functions that can be customized based on the type of analysis being performed. The discovery process combines context- and content-based decision models to generate better predictions. These features allow organizations to identify complex relationships for promoting products, improving sales, reducing fraud and other business benefits.
The question is not whether graph analytics can impact your business. It’s whether you will achieve that impact before your competitors do. Using graph analytics to increase revenues, improve customer loyalty and reduce waste will help achieve outstanding business performance. The wealth of information and insight currently hidden in graphs is enormous.
Identify Influential Customers
//Find top 10 most influential customers with high positive
SELECT customerid, review, out_polarity, (.60 * normalized_
sentiment + .40 *pagerank) as score
FROM (SELECT *
FROM PAGERANK (ON cdr_table AS Edges PARTITION BY
ON cust_table AS Vertices PARTITION BY
ON (SELECT count(*) FROM cust_table) AS
THRESHOLD(‘1E-8’) ) ) x,
(SELECT *, opinion_sum::double/word_count as normalized_
FROM EXTRACTSENTIMENT (ON (SELECT *
FROM TABLE_FROM_AFS (ON
sentiment_lexicon.txt’) )) y
ORDER BY score DESC
David Simmen is an Engineering Fellow and chief architect for Teradata Aster’s nCluster platform. He has more than 20 years of experience in enterprise information management systems.