Connections
Services
Room for Success
3 steps to more effective capacity management processes.
by Dan Fritz, John Lind and Paul Barsch
In terms of capacity management, companies face a quandary as data volumes grow exponentially. If an IT organization doesn’t properly forecast for these extreme data volumes, it could run out of data warehouse system capacity when it’s needed most. Or even worse, the organization might exhaust its system capacity halfway through a 12-month budget cycle. That’s a scenario no CIO wants to have to explain to the CFO.
The ability to effectively plan and forecast company resources to meet demands that drive value and that address customer needs is critical for any business. For CIOs and IT management, this planning function is important across the breadth of their technology infrastructure—and is especially true for the data warehouse. Because most companies rely on the data warehouse for important daily analytics, running out of system capacity is not an option.
Maintaining a Data Warehouse Environment
Capacity management in data warehousing means understanding current and future resource usage, data capacity and performance trends. It helps maintain a data warehouse environment with sufficient performance and data capacity, and it assists companies in making the most efficient and effective use of the resources they own.
And while it may sound like a simple process, capacity management for data warehousing involves significant subtleties. Often, database administrators (DBAs) don't understand where to find capacity management data, much less how to measure it. Thus, it is a best practice to create an official capacity management function or capacity analyst role within an organization.
Want better capacity management processes? Teradata proposes three steps: consider, forecast and continue. These steps are based on best practices gathered from dozens of Teradata Professional Services capacity management engagements. Following these steps will help you better understand and predict capacity demand for all systems—production, quality assurance and test environments.
Step 1: Consider
To accurately discern current and future capacity requirements for a data warehouse from Teradata, two measures should be considered: disk space and system power usage.
Disk space is a common gauge because it is easy to measure and understand, yet it is usually not the main limiting factor of data warehouse performance. Measures that typically take precedence include system power usage metrics such as CPU and I/O utilization.
Knowing metrics for capacity management is important, but one must also identify where this data is collected in the data warehouse:
- ResUsage tables. These tables offer specific system-wide usage details for CPU, physical I/O, disk, memory, AMP worker tasks and more.
- Database Query Log (DBQL) and accounting (Acctg tables). These allow analysis of the processing power consumed by specific queries and provide data on finished and aborted queries.
- Disk space over time. Performance data collection processes provide historic storage data by table and database.
A final part of this step is the examination of operational windows—essentially, times of the day, week and month—during which the data warehouse is used. Special focus should include times of peak demand. For example, most capacity analysts would think to separate "Massive Mondays"—it is not uncommon for most reports to be run by 10 a.m. on Mondays—from other workloads. But what other peak times might be missed? Perhaps the last Friday in a quarter, when the finance department is the heaviest user of processing power.
The work that a data warehouse performs definitely has rhythms. To make forecasting useful, consider operational windows such as peak hours during which usage is highest, impact of time zones for global usage, fiscal monthly reporting, and calendar items such as workdays versus weekends. Viewing these different periods of system consumption separately helps understand how different operational cycles affect capacity. Furthermore, it is important to remember that available system resources in the afternoon, for example, cannot help meet a morning deliverable.
Step 2: Forecast
Forecasting the usage of data warehouse resources is the next step. No prediction is perfect, but for good forecasts, an understanding of new projects and planned application changes is necessary. New projects may include the addition of users, data and possibly applications accessing the warehouse. Identifying additional future work for the data warehouse helps establish parameters for both CPU and disk needs.
Combining data supporting historical usage and the information on planned workload changes enables a capacity analyst to focus on forecasting. For this effort, various statistical forecasting techniques exist, including simple and multiple regression techniques. By taking into account various capacity metrics, windows of operation and future workload changes, the analyst can more accurately project data warehouse capacity needs.
With a proper forecast in hand, the analyst now has a good gauge for current and future requirements. Armed with this information, he or she can collaborate with business users to take advantage of time frames when the data warehouse has idle processing power or reduce the impact of peak times by moving work from one operational window to another.
Step 3: Continue
Capacity management isn't a one-time event, nor should it be an annual exercise to justify yearly budget expenditures. Because business growth often places high demands on the data warehouse, annual planning efforts might not accommodate rapid changes in the business environment. Therefore, capacity management should be an ongoing effort that monitors the pulse of business needs.
To that end, plans should be reviewed and adjusted regularly. Working alongside business users and sponsors, the capacity analyst should hold meetings at least monthly to identify new workloads and potential changes to applications. Stakeholders might include a data warehouse committee, DBA staff and application managers.
Commitment Needed
An ongoing capacity management effort should be a critical function in data warehousing processes. It includes a well-documented plan that considers current disk space and system processing power and that allows for what-if scenarios for new applications, users and/or data planned for the data warehouse. To avoid surprises and embarrassment, you must commit to capacity management as a key discipline to consistently meet the needs of your evolving business.
Dan Fritz is the Services Offer Manager for Teradata Performance and Capacity Services.
John Lind is a Teradata Professional Services consultant developing capacity management tools while at field engagements.
Paul Barsch directs Teradata Professional Services marketing programs for Teradata Performance.