Find the Perfect Match
Look at capabilities and benefits to determine the best database system for your business.
Selecting a database that can store retrievable analytic information and serve an organization’s needs can be a daunting task. Because of increasing data volumes and data’s high utility, database capabilities have exploded over the past few years, which makes the buying decision even more complicated. While stalwarts like the relational row-based enterprise data warehouse (EDW) continue to be popular for businesses, no single solution can satisfy all enterprise data management needs, so sometimes more than one system is needed.
Although data storage remains inexpensive, costs for keeping “all data for all time” in an EDW are causing business executives to consider new solutions for handling the growing volumes of data. The key to making the data storage selection is to understand a company’s workloads—current and projected. Businesses can then pick the database that best matches those workloads. Looking at categories of information management storage and each database’s benefits can help companies make the best choice.
Relational Row-Oriented Data Warehouses and Physical Data Marts
Relational data warehouses use tables that are a collection of rows for a consistent set of columns. Row orientation describes the physical layout of the table as a series of rows, with all columns stored in the same order.
A relational row-oriented data warehouse is ideally suited for historical information and should be considered the default data store for reports and analytics. Unlike other information systems that exist for solving an analytical need, the relational row-oriented data warehouse provides a permanent record of information or else keeps it for as long as the organization chooses. This is important since laws dictate the minimum level of record keeping that companies must follow, causing information to be retained longer in the data warehouse.
Relational data warehouses increasingly contain solid-state disks for high-use data. They support buffering of data and the reuse of previously queried results and optimizer plans. Since most historical information should be kept in the warehouse, it can provide source data to any multi-dimensional structure or data mart. The data warehouse should serve as the point where data quality is assured.
TERADATA PURPOSE-BUILT PLATFORM FAMILY
Teradata offers data warehouse and data mart platforms in which all functions in the Teradata Database are done in parallel using multiple server nodes and disks, so every query is executed on a massively parallel processing (MPP) system. The Teradata Active Enterprise Data Warehouse (EDW) line now supports more than 50% of large-scale data warehouses. The platform scales to 92 petabytes (PB) of data space and supports thousands of concurrent users on multiple applications based on its enterprise workload management.
The Teradata Data Warehouse Appliance is suitable as a general-purpose data mart, integrated data warehouse or analytical sand box. With eight MPP nodes per cabinet and scaling up to six cabinets with 55 terabytes (TB) each, the Teradata Data Warehouse Appliance can manage up to 315 TB with the workload characteristics of a typical data warehouse—multiple, complex applications serving a variety of users.
The Teradata Data Mart Appliance is a limited-capacity equivalent of the Teradata Data Warehouse Appliance and is ideal for a departmental or test development platform. It’s a single node, single cabinet with a user data capacity of 12 TB.
The Teradata Extreme Data Appliance provides a solution for managing large quantities of data. While the Active EDW tops out at 92 PB, the Extreme Data Appliance scales to 186 PB. A system of this size has fewer concurrent users because it supports deep-history analytics, not recent data reporting. Statisticians like this type of system because it offers huge amounts of data to calibrate their predictive analytic models.
Columnar databases have a clear ideal workload, which is when queries require a small subset of an entire row. Unlike relational database management systems that store data only in rows, columnar databases store it in columns. In these databases, each physical structure contains all the values of one column of a table. This isolates each column and brings only the useful columns into a query cycle. This is a way around the all-too-common I/O bottleneck that analytical systems face.
Columnar databases are the best solution for handling large row lengths and large data sets. Functions such as average and sum perform better in a columnar database than in a row-oriented database. Many organizations have these types of workloads and would benefit from this system.
Teradata recently unveiled its hybrid row and column database. Teradata Columnar fully integrates columnar and row-based tables for enhanced flexibility, performance and compression. It allows users to mix and match columnar and row-based physical storage to best suit an application. This enables Teradata applications to access both row- and column-structured data.
The dramatic compression offered by Teradata Columnar reduces the I/O needed to read the data into memory since the necessary information to answer a query is compressed to a fraction of its original size. This helps relieve I/O congestion.
MapReduce is a parallel programming framework for large-scale data. The tasks it performs are typically a small subset of those most often done with a relational database, but the data has a different profile from what would be traditionally stored in a relational database.
New business opportunities sometimes require new solutions like MapReduce. The massive, unstructured, Web-scale data it can handle contributes to operations and to multiple aggregations that enhance profiles. Each bit of data MapReduce collects may be a “gem” that drives a business process or is interesting to a batch process. Just being able to store the unstructured data allows for future processing, if the business requires it.
What makes MapReduce different from relational databases is its highly flexible support for programming languages. A programmer can use almost any language to perform sophisticated functions and analysis without being limited by the confines of a relational database.
The workload for MapReduce is massive data that’s collected over time and high-volume data from a single day. Users can put data into MapReduce where the functionality is limited to batch processing with a specific set of query capabilities. Since most MapReduce systems are flat file-based with no relational database for performance, nearly all queries run a file scan for every task, even if the answer is found in the first block of disk data. MapReduce systems are primarily for unstructured data since that data grows quickly, but it only needs batch processing and a basic set of query capabilities.
TERADATA ASTER MAPREDUCE PLATFORM
Teradata Aster offers in-database MapReduce with nCluster, which simplifies data processing across massive data sets. It also offers SQL-MapReduce—a framework to allow developers to write powerful, highly expressive SQL-MapReduce functions in languages such as Java, C#, Python, C++ and R, then push them into the platform for advanced in-database analytics.
SQL-MapReduce is architected for optimal in-database analytics execution and is best for custom transformations and aggregations, inter-row analysis, nested sub-queries and analysis that requires the reorganization of data into new structures.
Full Range of Solutions
Different workloads require different analytic environments. As organizations deal with growing amounts of data from a greater variety of sources, they need databases that can meet their business demands. In essence, they need to match the database to the price, performance and characteristics of their data.
Teradata offers a range of systems that enable companies to integrate diverse data for enhanced business intelligence (BI). With its data warehouses, columnar database and MapReduce platform, Teradata provides the technology necessary for making information a corporate asset.