Ask the Experts
Effective management for the Teradata analytic environment.
Many businesses understand the value of a single, integrated data warehouse. Through it, users across the enterprise can access the organization’s entire scope of data and obtain analytics that go beyond customer information and product inventory. From this information, countless decisions are made every day. However, at times, an enterprise has good business reasons for storing data in a system separate from the primary data warehouse. When little value can be gained from integrating certain data with the rest of the warehouse, or when the volume stored and price/performance requirements call for a different kind of platform, there may be good justification to hold this data in a separate, dependent analytical system. For example, as an extension of its main data warehouse, an organization might use a data mart to store its less critical data requiring specific analysis. Valuable information, on the other hand, such as the millions of call-detail records or billions of customers’ Web clicks are examples of data that is important for analysis but that might be considered too large to be stored in the enterprise data warehouse (EDW) for effective analysis. So, as long as it is merely stored in isolation, this untouched data delivers zero return to the bottom line.
When business justification calls for more than one system, organizations need a way to manage, monitor and control all aspects of the analytical environment—from the hardware and software components to the process and flow of data. This level of management keeps the complexity of the ecosystem low while providing high value to the business.
Definition and moving parts
The Teradata analytical ecosystem isn’t just an EDW with associated extract, transform and load (ETL) systems and business intelligence (BI) tools and reports. It is a collection of systems and ETL processes to get the data from source systems into the decision-making environment, and data replication and copying processes to move data from system to system to meet application needs. Users from across and outside the enterprise use applications, Web sites, automated dashboards and reports, and ad hoc SQL to make all types of decisions throughout the day. The entire environment must be monitored, managed and controlled so the maximum value is provided to the organization at the least cost.
The Teradata analytical ecosystem is a collection of systems and ETL processes to get the data from source systems into the decision-making environment, and data replication and copying processes to move data from system to system to meet application needs.
This analytic environment is, by design, dynamic. Data is moved or copied from system to system. New data areas are added. New application experiments are run in a “sandbox” and then, potentially, moved into one of the permanent systems. Usage varies with business cycles and unexpected events within the business environment. Change is the norm rather than the exception. And automated rather than manual management of the environment, not just individual systems, must be the standard operating procedure. (See figure.)
Within a Teradata analytical ecosystem, once data enters the environment via the ETL process, it must flow to the right places at the right time for the intended business uses. Teradata Data Mover and Teradata Replication Services can move the data between systems in near real time, on a scheduled basis or in response to an administrator’s request.
Ensuring that the users and applications don’t have to be concerned with the complexities of the environment or be aware when the analytical ecosystem changes, Teradata Query Director routes queries to an available system with the necessary data.
Overseeing the entire analytical ecosystem and keeping administrators and users apprised of key information are Teradata Multi-System Manager and Teradata Viewpoint. They allow a unique level of coordinated management at the business, application and data level rather than just at the system level.
Teradata Multi-System Manager gives users an intuitive view into the workflow of the analytical ecosystem, along with other important system indicators. Through its various portlets, system administrators can intelligently orchestrate all of the moving parts within the analytical ecosystem so that they operate smoothly, react to changes and exceptions, and keep administrators and users informed. Teradata Viewpoint is the robust, Web-based management console that enables administrators to easily manage the dynamic architecture of data, systems and processes to keep the ecosystem in balance and running efficiently.
Click to enlarge
Let’s look at an example of the interdependencies within the analytical ecosystem and how the cause-and-effect relationship of the moving parts affects the way a business operates:
A retailer’s purchasing agent needs to review the current inventory and determine which products should be included in the upcoming annual sale. The purchasing agent knows that the inventory information is refreshed in the data warehouse every day at 2 p.m. and is ready by 3 p.m. To get the most accurate and current inventory information, she plans to run the reports after 3 p.m.
At 1 p.m., unknown to the purchasing agent, the ETL server that loads the data warehouse fails. What is the effect on the business?
Without an “umbrella” understanding of the interdependencies of the analytical ecosystem, it would be difficult for anyone to know all of the cause-and-effect relationships and what they mean to the operations of the business. However, when the business has an overall understanding of these interdependencies, the effect on business operations will be fully understood and the appropriate actions can be taken.
It turns out that no critical ETL jobs are running at 1 p.m., so the failure of the ETL server has no immediate impact on the business. At 1:30 p.m., a non-critical weekly run of demographic data is loaded into the warehouse. If this data is not loaded on time, the business experiences no immediate impact, as this data doesn’t change significantly from week to week. At 2 p.m., the mission-critical inventory data is loaded into the warehouse. This must be started on time in order to be completed by 3 p.m. So there is a one-hour window, from 1 p.m. to 2 p.m., during which the ETL server must be repaired and back in service before users of the inventory data are affected.
Using Teradata Multi-System Manager, the DBA can see the impact of these events to the analytical ecosystem and can make a timely decision regarding what to do about the inventory data. Additionally, Teradata Multi-System Manager can automatically send a targeted notice to the users of that information alerting them that the information they expect to be ready is, in fact, not ready. Those users can then determine the best course of action given these circumstances.
Management of the analytical ecosystem
The orchestration of three important tasks is needed to manage the analytical ecosystem:
- Data movement and synchronization
- User or query routing
- Monitoring and control
Data movement and synchronization
Data is brought in from source systems via scheduled ETL jobs. Some organizations have a handful of load jobs, while others have thousands defined. Each load job has a purpose: Unique data is brought into the warehouse at an exact time to meet an explicit analytical purpose for specific users. A dependency chain can be followed as the data is first received from the source system and loaded into certain tables, then indexes and summary tables are updated, and, finally, queries are run by users. Any disruption along this path will have a downstream impact.
Once the data is in the warehouse, it’s not always at its final destination. Many enterprises have more than one analytical system, either for high-availability failover or to run special-purpose workloads. In such cases, Teradata Data Mover or Teradata Replication Services can copy and synchronize data between these systems. Synchronized data comes in two forms: outside data that is loaded into one system, and transactional data that is changed directly in the data warehouse. In either case, state information needs to be maintained to understand and track what data is where, and what synchronization is required.
User or query routing
When an organization has more than one analytical system, user and query routing becomes important—user queries must be directed to the right system to satisfy the queries. In some cases, queries can be satisfied only by a single system that is prepared to handle the query because the data for the query resides only on this system. In this case, queries can go only to this system and there are no failover capabilities should the system become unavailable.
In other cases, when more than one system can handle the incoming query, a routing algorithm must be applied. Teradata Query Director can balance routing of workloads between the multiple systems, or it can be used for high availability and disaster recovery. In these cases, it redirects the query to an alternate system when the primary one is unavailable.
User query routing can be manual or transparent. Manual routing usually means that a conscious decision is made to reroute queries to an alternate system followed by human intervention to make that rerouting happen. With transparent routing, end users are unaware of the change to where their queries are routed, and the rerouting requires no effort on their part. When the service level goal is to provide continuous and uninterrupted service to end users, transparent routing is preferred.
Monitoring and control
To orchestrate the moving parts of the analytical ecosystem and understand their interrelationships, monitoring and control capabilities are needed.
Teradata Viewpoint enables information from multiple systems and processes to be brought together into one simple dashboard and customized for exactly the information each administrator needs.
Teradata Multi-System Manager gives a simplified and unified view of the state of all components, processes and data within the ecosystem and helps translate all of this information into an easily digestible visual interface. It assimilates a lot of information into a simple message so that system administrators can know the operational state of the ecosystem, and end users can know the readiness state of their applications.
The monitor and control capabilities provided by these two tools also enable the ecosystem to report on itself, giving system administrators and end users targeted information that is relevant to them.
Teradata provides the robust toolset to conquer the complexity of having multiple systems, and it administers seamless management of the ecosystem from front to back.
Teradata’s analytical ecosystem monitoring and control allow for contextual monitoring so that alerts are analyzed given the circumstances at hand. A system outage at 10 a.m. will have effects on business operations that are different from those of a system outage at 3 a.m. These events should be looked at in the appropriate context.
Alert thresholds can also be set for events that happen within the environment and for events that were supposed to happen but didn’t. Initiating an alert for a critical event such as a failed server is expected; initiating an alert for a scheduled event, such as load job that was supposed to start at 2 p.m. but didn’t, is proactive management.
Out of isolation
Adding a special-purpose data appliance to the analytical environment for analyzing dormant or overlooked data has great potential to increase business value and return on investment (ROI) to the organization. A data mart in isolation, with out-of-date information and a lack of management oversight, only adds complexity and costs.
Real value comes from integrating the special-purpose systems with the rest of the Teradata analytical ecosystem, driving data and analytics back into the organization where they will be the most useful. Teradata provides the robust toolset to conquer the complexity of having multiple systems, and it administers seamless management of the ecosystem from front to back.