Ask The Experts
Make The Connection
The Teradata Aster Big Analytics Appliance brings together Hadoop and Aster for a one-of-a-kind solution.
Hadoop and the Teradata® Aster patented SQL-MapReduce have individually gained popularity by delivering business value across enterprises. Now, the Teradata Aster Big Analytics Appliance brings them together to enable business users to perform advanced analytics faster and easier. Teradata Magazine spoke with Desmond Chan, Teradata Aster technical product manager, to learn more about the appliance and its unique benefits.
What is the Teradata Aster Big Analytics Appliance?
CHAN: The appliance is an integrated hardware and software solution that contains the Aster Database and Apache Hadoop to enable big data analytics. All nodes, disks, software and networking are pre-packaged in the appliance, which provides support for BI and ETL tools, and server and service management and monitoring capabilities.
It offers integrated backup nodes for Aster Database, Teradata Viewpoint integration and Hadoop nodes for storage and retrieval of larger volumes of data. It features the SQL-compliant SQL-H interface, which is a feature of the Aster Database v5.0. It also supports Hadoop HCatalog, which is an open source solution that provides metadata integration and interoperability between Pig, Hive, Hadoop MapReduce and Hadoop Streaming products. The appliance leverages the HCatalog metadata to access the information residing in Hadoop, allowing the structured data in the Aster Database and the unstructured data in Hadoop to be analyzed as one entity. Plus, the appliance features Teradata Server Management to support multiple systems per cabinet while benefitting from Infiniband networking to enable this configuration.
What’s so unique about this Appliance?
CHAN: Other appliances offer no intrinsic integration between their RDBMS and Hadoop. They’re usually separate solutions glued together with no way to connect structured and unstructured data. SQL-H, a major differentiator for the Teradata Aster Big Analytics Appliance, is a key technological advancement for business analysts and data scientists who need to include Hadoop data in their discovery and analytics processing within the Aster Database without necessarily keeping the data in the Aster Database. SQL-H allows business users to:
- Access unstructured data in Hadoop via a true SQL-compliant interface.
- Perform BI tasks through a seamless integration with BI and ETL tools, such as Tableau and MicroStrategy.
- Use more than 50 Aster SQL-MapReduce functions directly on Hadoop data within the Aster framework.
- Easily integrate data from the EDW with Hadoop data for better analysis.
- Provide better interoperability with other Hadoop projects through HCatalog.
Why is Teradata putting the Aster Database and Hadoop in the same box?
CHAN: With the increasing demand for big data analytics architecture, we believe that customers benefit the most by having an intrinsic integration between the two systems in a single box. The Teradata Aster Big Analytics Appliance uses a 40Gb Infiniband physical connection to speed node-to-node communication between the Aster Database and Hadoop nodes. On the interface side, Aster SQL-H eases administration by enabling managed communication between the different nodes to intelligently read just the data needed from Hadoop for SQL queries and SQL-MapReduce functions.
Can you describe a use case that takes advantage of the Aster Database, Hadoop and the Teradata Database?
CHAN: Let’s say a business analyst needs to find relevant customers on the Teradata Database for an upcoming marketing campaign. The campaign will use cross-channel data, both structured and unstructured, that’s stored in the Teradata Database, Aster Database and Hadoop Distributed File System [HDFS].
The analyst starts with transactional data in the Teradata Database and customer web data in Hadoop. He uses SQL-H to transfer web data from Hadoop and the high-speed Teradata-Aster Adaptor to move transactional data from the Teradata Database to the Aster Database. With both data types in the Aster Database, the SQL-MapReduce function nPath can be run on the combined data to identify the browsing and purchasing order for each customer. The result from nPath is loaded to the Teradata Database to predict customer purchases and find customer scores.
How would this work with other types of data?
CHAN: One example is Apache Web logs—the activity and performance data of a web server. The log data is bulk loaded into Hadoop. Simple transforms can be performed, generating tables so HCatalog can maintain the table metadata. Then SQL-H accesses the data and triggers any SQL-MapReduce functions for further log parsing, sessionization, attribution, nPath and more in the Aster Database. Finally, the result set of the analysis can be moved into the Teradata Database via the Teradata-Aster Adaptor where it can be accessed by BI tools. Think about it like this: Hadoop serves as the data store for capturing and refining bulk data, the Aster Database acts as the discovery platform, and the Teradata Database acts as the data warehouse.
How do the Aster Database and Hadoop interface within the Teradata Aster Big Analytics Appliance?
CHAN: There are a few options, depending on users’ needs and requirements:
- SQL-H allows users to access Hadoop data directly from the Aster Database and enables Hadoop to be integrated with standard BI and ETL tools without any modification and still have superior performance.
- JDBC users can establish connection for bi-directional data transfer.
- Teradata Aster Professional Services provides Hadoop loaders in the SQL-MapReduce framework to load data between the Aster Database and Hadoop.
How do the Teradata Database and Hadoop communicate?
CHAN: Communication takes place through various data connectors. JDBC connectivity is provided on both the Teradata Database and Hadoop. Plus, Teradata provides its own extension of the Input and OutputFormat in the Hadoop MapReduce framework, and Teradata Parallel Transporter is available through custom access modules.
What about the Teradata and Aster Databases?
CHAN: You can leverage the Teradata Parallel Transporter between the databases by invoking it via API, script, command line and Wizard. You can also use the Teradata-Aster Adaptor for high-speed data transfer between the databases. ODBC and JDBC can be used for Teradata-Aster connectivity.
With the popularity of “free” Hadoop, why should customers buy Hadoop nodes from Teradata?
CHAN: The concept of “free” Hadoop is overrated and often mischaracterized. The software can be attained with no license fee, but the operational costs for hardware procurement, software configuration setup and testing are significant. Also, developing Hadoop expertise in-house can be an arduous task and require significant expense.
Teradata nodes for Hadoop are pre-packaged, configured and tested; and Teradata provides full support for hardware, software, OS, network, setup and operation issues. In the near future, we also plan to provide monitoring capabilities for Hadoop nodes and services through Teradata Viewpoint in a turn-key solution that eliminates many operational costs.
Where do you see Teradata Aster technology heading?
CHAN: I expect the company to devote resources toward making big data analytics in our MapReduce framework easier, more interactive and more powerful. You’ll also see enhancements in the SQL-H area to ensure customers continue to get value from Teradata, Teradata Aster and Hadoop solutions.
In the SQL-MapReduce area, functionality will be improved to make it simpler for business analysts and data scientists to perform complex analytics. In addition, we’ll enhance the reliability and availability of our systems through more backup and failover options.