A lab validation shows Teradata Aster and Apache Hadoop play complementary roles in big data analytics and processing.
Consider this: by some estimates, we create 2.5 quintillion bytes of data each and every day. And perhaps as much as 90% of all the data in the world today was created in just the last couple years. This means even a relatively small business could be looking at data sets that are exponentially larger than everything they may have dealt with in the past—combined. It’s no surprise, then, that data analytics is top of mind for most organizations.
Effectively managing and leveraging all data—structured and multi-structured—is crucial for organizations. Businesses benefit from alternative data processing and analytical approaches that are capable of handling all data types, regardless of size or complexity and providing new ways to discover new insights. Several options have entered the market that are specifically designed to handle large and complex data sets, including the Teradata® Aster MapReduce Platform and the open-source Apache Hadoop.
How do these solutions compare, and what is the best use of each technology alongside your data warehouse in a unified data architecture? The answers may surprise you.
A Tale of Two Platforms
Apache Hadoop and the Teradata Aster platform each support MapReduce, but in very different implementations. Hadoop MapReduce is implemented on top of the Hadoop Distributed File System (HDFS) whereas Teradata Aster’s patented SQL-MapReduce® is implemented on a massively parallel processing (MPP) relational database. Both process extremely large data sets across a compute cluster or grid, but the performance characteristics differ radically depending on the format of data and type of processing or analytics required.
To see how they compare, Enterprise Strategy Group experts conducted hands-on testing against multiple real-world, multi-structured, large data sets using identical hardware and network specifications. The testing focused on ease of use (time to develop) and time to insight (performance across a wide variety of ETL, simple query, and advanced analytic processing).
The test results were compared to ensure that each platform returned exactly the same information after each operation. Testing showed the advantages of each platform based on the type of data management or analytical task at hand.
Teradata Aster Strengths
Testing showed that the Teradata Aster MapReduce Platform had a clear advantage as a big analytics and discovery platform for iterative ad-hoc analysis using a variety of queries, analytics and hypothesis on what the data can tell the business. The end-to-end discovery process for a real-life business scenario was five times faster in Teradata Aster, which means business analysts and data scientists can uncover new business insights in hours versus days due to both ease of use and analytical performance.
EASE OF USE
- The use of SQL-MapReduce significantly increased the speed with which users could develop both simple and complex queries. It also kept the platform accessible to other SQL-based applications and reduced the manpower and skillset required for ongoing development.
- Overall, the Teradata Aster MapReduce Platform development for the scope of the test was 121 hours, nearly five work weeks faster than Hadoop, which took 323 hours and often required the development of custom Java MapReduce solutions to achieve the same results.
- Queries ran an average of 35 times faster on the Teradata Aster platform, with some test cases running an impressive 416 times faster than Hadoop. This was due in large part to the unique hybrid architecture of Teradata Aster and the SQL-MapReduce Framework. In Aster, rather than requiring MapReduce processing for each step in the analysis, SQL is used in place of a Map (or Reduce) phase where more efficient and MapReduce is used only in steps that cannot be expressed in SQL—all in a single pass of the data.
- With highly structured data, the Teradata Aster MapReduce Platform was nearly 100 times faster than Hadoop.
- With unstructured data, the Teradata Aster MapReduce Platform ran 15 times faster than Hadoop.
DATA LOADING, STAGING AND REFINING
- Both platforms loaded very quickly. Although the differences were often mere seconds, on average, Hadoop loaded data 1.8 times faster than Teradata Aster. Hadoop simply copies any data type into the file system where it can be stored and “staged” for further processing.
- Simple data refining and transformations ran an average of 1.3 times faster on Hadoop, depending on the file type and transformation logic. For unstructured text data (feeds from Twitter in this test), Hadoop ran 1.4 times faster. However, for semi-structured data such as Web click-stream, Teradata Aster was twice as fast.
BIG DATA ANALYSIS
Even though both solutions are certainly up to the task of big data analytics, they are not created equal. Testing showed that the Teradata Aster MapReduce Platform outperforms Hadoop for large-scale data discovery or “investigative analytics,” while Hadoop can complement the Teradata Aster solution on scale-out data storage and refining.
The Teradata Aster platform offers a big data analytics solution with performance and scalability that dwarf the capabilities of traditional databases and disk arrays, while being easy to implement and manage. In addition, the Teradata Aster MapReduce Platform delivers big data insight quickly, without the need for manual, complex coding.
The Teradata Aster patented SQL-MapReduce framework enhances its accessibility to enterprises, making it possible for any user familiar with SQL to access the data and generate valuable insights without having to learn a new programming language. But it also offers developers and data scientists the same flexibility to code new analytic functions in their language of choice and make them accessible to the business through SQL-MapReduce. It also offers the ability to process relational data and multi-structured interactional data side by side.
The platform includes modules that simplify usage of MapReduce in several areas, including:
- Path and pattern analysis
- Statistical analysis
- Relational analysis
- Text analysis
- Clustering analysis
- Data transformation
- Data parsing
These modules can be mixed and matched, and they can be used with standard SQL, custom SQL-MapReduce functions or with any analytic logic that’s designed to run inside the Aster Database. This gives Teradata Aster MapReduce a huge advantage over open-source solutions that require more labor-intensive, ad-hoc development.
But Hadoop can add tremendous value in the data architecture. In fact, both solutions have specific strengths that work well together. Hadoop is very good at capturing, storing and refining unstructured and semi-structured data in its native format. This makes it a great addition to Teradata Aster, which earns its stripes when situations call for fast, iterative queries and analytics on structured and multi-structured data.
Pick a Winning Solution
When it comes to choosing big data technologies, it doesn’t have to be an either-or decision. Sometimes, the answer is “both.” So consider the value of both technologies when selecting a big data solution.