Applied Solution 1
Cloud computing facilitates nimble testing of new theories or exploration of untested data.
Cloud computing represents a huge change in the IT industry, as it involves moving hardware and applications into a virtual Web-based model. The promise of cloud computing is that it transfers the burden of managing the IT environment to someone else and lets the end-user focus on its benefits. Its merits include self-service, resource pooling and rapid elasticity, which translate into end-user empowerment, lower overall costs and flexible usage capacity.
These benefits apply to data warehouse users in the form of agile analytics through on-demand self-service access to a shared pool of computing resources, as well as rapid elasticity to meet business needs. This means business analysts can have quick access to data warehouse resources for fast experimental analytics. The shared platform often comes with a lower total cost of ownership (TCO) through greater utilization of hardware resources.
The data warehouse has evolved into the decision-making engine driving analytics in every aspect of the business while meeting the needs of diverse users. To meet these demands, the system requires IT rigor, safeguards and governance that accompany any business-critical system. Unfortunately, rigor and governance make it difficult for analysts to add and explore untested data or prototype new ideas. Some companies have circumvented the process by building separate experimental analytical platforms with a shadow IT group. But with the growth of analytics, it’s becoming clear that these siloed analytics environments are expensive to maintain and cannot meet the demand, growth or speed of business.
Companies need agile analytics to quickly test theories and new ideas to drive innovation and react to competitive pressures. They require flexibility in terms of speed and agility to explore new, unrefined data and experiment on new theories without long planning periods. They also must determine that the analysis was a success and then swiftly shift it into a production environment—or, they need to fail fast and move on.
A data warehouse from Teradata has always enabled business agility by allowing users to ask any question on any data at any time. What’s different is the ability to apply that same analytics agility on new, untested, uncleansed data that has not yet met the criteria to reside inside the warehouse.
Use of a Data Lab
An agile analytics data warehouse with an integrated data lab provides both the IT technician and business analyst a solution to enable agility for new data. It helps quickly assimilate untested data into a separate “data lab” portion of the data warehouse. This provides a self-provisioning, self-service environment for swift prototyping and analysis on new, external, uncleansed data, which is temporary in nature. (See figure.)
Click to enlarge
The data lab is physically contained within the enterprise data warehouse (EDW), where a portion of the EDW is set aside for non-production, experimental purposes. Users within this environment are free to load any data for ad hoc analysis.
In addition, the data lab users are given read access to select tables allowing these business analysts to include existing tables in their queries. This eliminates the need for moving data outside the warehouse and prevents data duplication within it. The production platform is protected from users of this data lab through workload management, thus ensuring unpredictable queries submitted in the data lab don’t affect the service level agreements of production users.
The data being analyzed in the data lab should be treated as temporary, and after initial hypothesis testing, it should be determined whether the data has ongoing value in the warehouse. If it does, it should be sent to the IT group to become part of the standard data sourcing process. Limits in terms of the number of users and life of the data lab should be strictly enforced to prevent any temporary data from inadvertently migrating into a pseudo-production environment.
The database space should be dynamic—created and removed in cycles based on its usage. Time controls should be established up front with automated expiration dates to guarantee deletion of the non-production data.
Additionally, assuming the user already has an account on the data warehouse, security and access rights should follow established guidelines for non-production and production data.
Click to enlarge
Workload management using Teradata Active System Management or Teradata Priority Scheduler is the first and foremost requirement for the data lab environment. It allows control over the amount and timing of system resources needed to be assigned to the data lab environment versus the production workload.
With sophisticated controls, the user can determine:
- Which other users or queries can run at which time and at what priority
- What objects can be accessed
- The amount of CPU that can be consumed, and more
This allows the data lab to benefit from running on the production system while protecting the core production workload.
Teradata Viewpoint provides Web-based Teradata system management portlets that extend on-demand self-service access to and easy management of the data labs.
Additionally, Teradata has a number of in-database tools and leading analytics partners to optimize the analytic process. Teradata Profiler allows analysts to maximize their data exploration experience by providing a simple interface that lets business users automatically generate statistical profiles on variables in the database and graphically analyze results.
The various partner tools include business intelligence/online analytical processing (BI/OLAP), data mining, text analytics and visualization tools.
Teradata Professional Services is an essential part of establishing and delivering a data lab within the data warehouse platform, offering start-to-finish consulting and implementation. The services team gathers customer requirements and applies best practices to designing and implementing data lab partitions with workload management, security and permissions, governance, and self-service capabilities.
Benefits of the data lab are numerous for both business and IT. This environment provides business users with efficiency gains that are measured in weeks and months, as projects no longer need external data marts that may require hardware and software procurement.
Users can self-provision a portion of their EDW and quickly analyze data. Data lab technicians can join production data without replication or movement. Businesses can reduce their day-to-day dependence on IT with the self-provisioning and loading capabilities of the data lab. And because it runs in-database, it also benefits from the Teradata Database parallel processing infrastructure increasing performance by orders of magnitude.
Additionally, IT gains a simplified analytics environment. Potentially new physical data marts, which would have otherwise been set up outside IT’s processes, are reduced in number and managed as virtual data labs, allowing users to centralize their analytics while promoting reuse. IT also has full control of the data labs by establishing workload management to ensure the production platform is not affected.
Plus, costs are reduced because no longer are thousands of small applications running on separate servers and storage systems. These non-mission-critical applications all carry a minimum cost of hardware, licenses and labor, not to mention inconsistent reporting results. Processing within the data labs also leverages the unused CPU cycles within the data warehouse, allowing businesses to maximize their system usage.
Swift Analysis Is Key
The real value of implementing an agile analytics data warehouse with an integrated data lab is the resulting business benefits. Analytics is the source of differentiated decision making and competitive advantage, so rapid analysis and action are keys to success. Testing existing data as well as new experimental data quickly and with limited investment of time and resources (from IT and non-IT) is invaluable.
Testing new theories by individuals or groups brings great value when they have autonomy of self-service, loading, reloading, analyzing and re-analyzing data when they need it. And examining the outcome of theories often depends on the amount and quality of data available. Testing new data combined with existing cleansed data in the warehouse brings together the best of both worlds.