Features
Making the grade
Testing and validating predictive models ensures real business benefits.
by Cindy Waxer
Imagine a multinational telecommunications company losing thousands of its customers to a fast-growing competitor. Determined to retain subscribers through special marketing programs and incentives to reduce customer churn, the telco sets out to ascertain, in advance, the propensity of certain customer segments to cancel their service in a given time period.
- First, the company enlists data mining to create a predictive model based on historical and demographic information, such as a subscriber’s gender, age, long-distance plan and handset type.
- Next, the model produces a mathematical equation designed to identify those subscribers most likely to change providers. For example, the company discovers that male customers age 24 to 32 who subscribe to DSL broadband and own a smartphone are more likely to switch than female customers age 35 to 38 who subscribe to a VoIP and own a conventional cell phone.
- Finally, by creating a statistical model of its customers’ future behavior, using a powerful mix of historical data and advanced mathematics, the beleaguered telco is ready to launch its predictive model using real-world data.
This process will certainly convert fleeing customers into loyal subscribers, right?
Wrong. A crucial element—the testing and validation of the model—was skipped. This step would gauge the model’s performance and accuracy to offer higher confidence in decisions and more trustworthy results in future campaigns.
"You need to be able to test a model in the real world to ensure that the way you’re using it is effective. That’s a step that a lot of organizations don’t necessarily take, but it’s one of the more significant steps of the process."
—Gareth Herschel, Gartner
More than a model
Analytics can uncover patterns by mining large volumes of data to predict behavior—a capability that has the potential to yield impressive results, from increasing revenue to cutting the cost of customer acquisition. But while foretelling customer behavior can have a dramatic impact on a company’s bottom line, uncovering patterns through data mining to anticipate the future is only half the battle.
The beleaguered telco’s predictive model, based on historical data, revealed that a segment of young, tech-savvy, male customers is most likely to switch vendors. But what if those customers were jumping ship because the telco failed to offer network support for a particular type of popular smartphone?
Six months later, that same telco is now providing support for this device and plans a multimillion-dollar marketing campaign to inform that group of at-risk customers. But using a predictive model based solely on historical data and past market conditions could result in inaccurate targeting and a significant waste of marketing dollars. That’s because the campaign’s target customers are now receiving the proper network support, thereby rendering the model’s historical data erroneous.
Clearly, relying on untested models can have dire consequences. “You need to be able to test a model in the real world to ensure that the way you’re using it is effective,” advises Gareth Herschel, a Gartner research analyst. “That’s a step that a lot of organizations don’t necessarily take, but it’s one of the more significant steps of the process.”
Testing 1-2-3
For analytics to provide real business benefits historical data must be split during the model building process to produce two distinct historical data sets:
- Training. This set is used to create a mathematical equation capable of calculating potentially predictive relationships. For example, a university uses admissions history and applicant information to determine the likelihood that students will choose to enroll next year.
- Validation. Independent of the training data, the validation—or test—data is used to evaluate a predictive model’s strength and accuracy in establishing predictive relationships. Usually this is achieved by splitting the sample that has been created to build the model into two randomly selected parts. Bear in mind that each part needs enough cases to be statistically valid. Executing the predictive model against the set of test data determines its effectiveness.
In the deployment phase, a predictive model would be tested alongside the marketing activity, using real-world customer data. By running this data through a predictive model, and by selecting appropriate control groups, a company can collect the most up-to-date and accurate information possible on everything from the likelihood of customer churn to the monthly rate of product returns. Armed with these powerful predictive analytics, an organization can extend its marketing campaigns and promotions in ways that are both cost-effective and on target.
In control
While a predictive model deployment cycle can be completed in the span of a few days, testing a model’s true business value in a real-world setting may take more time.
In order to prove the value obtained from a predictive model, the organization should establish a control group of customers who are currently active in a company’s database. For example, a small bank might develop a model that predicts that 20,000 of its 200,000 customers are most likely to seek a mortgage in the next three months. (See figure.) A specific marketing campaign will be developed for this target group. To test the campaign and the model itself, two control groups will be established.

Click to enlarge
From the 20,000 interested customers, the financial institution creates a control group of 1,000 potential new mortgage seekers. This first group, which will not receive the targeted marketing, will test the effectiveness of the campaign. Namely, do people who aren’t sent the marketing, take out mortgages at the same rate as those who are?
To test the predictive model, however, a second control group is required—this one selected from customers not identified by the model. This group will be sent the same marketing promotions. If the response among these randomly chosen customers is identical to the targeted group, the model is invalid.
Over a set period of time, the bank can compare how many members of its control groups applied for a mortgage versus the rate among the targeted customers. The result will provide a fair benchmark of the effectiveness of the campaign and model.
Where time allows, a test campaign might be launched in advance on a small subset of the target group, to see whether the offer and the entire campaign process works as planned. This will show what to expect and avoid bad surprises in the details of execution. Once they are OK with the results, the marketers can launch the full-fledged campaign.
Nevertheless, Herschel warns that the testing process shouldn’t be rushed. “Companies need to test a model’s business value for as long as it takes to generate a valid comparison between those being targeted by a marketing campaign and those who aren’t being targeted,” he advises.
Not one and done
Companies that view the testing and validation of a predictive model as a one-time event are in for a rude awakening. For a predictive model to continue to deliver real business benefits, it must be continually updated and reviewed. Employing best practices such as these can help clarify the process:
- Create expectations. Every time a company deploys a predictive model, it leads to results ranging from reduced customer churn to increased revenue. But while they can’t promise perfection, it’s critical that the company establish a baseline or criteria by which to measure a model’s proven success at accurately predicting events. When its results dip below this baseline, it’s time to test and validate the model again.
“Companies need to test and validate the way that models are used on an ongoing basis to ensure that they are used correctly and employed intelligently across the organization,” says Herschel.
- Establish ownership. When assessing the impact of a multimillion-dollar marketing campaign, it’s not uncommon for countless employees to enthusiastically lend their expertise. Not so when it comes to testing and validating predictive models—a sometimes lengthy process, especially when the data needed isn’t stored in a central data warehouse. Frequently, organizations simply don’t invest the necessary time and effort in testing whether a model performs as intended.
Complicating matters further is that the creation of a predictive model often requires input from a variety of employees, from a chief information officer to front-line call center representatives. Because of this distributed responsibility, a dangerous vacuum of accountability can form, allowing validation activities to simply fall through the cracks.
Fortunately, Herschel offers a reasonable solution. “Anybody who is using the model is a stakeholder in the validation process,” he says. “But the person responsible for designing and building that model is the person who needs to own that validation process.” After all, he adds, “The professionals responsible for developing a model typically have the best view of its effectiveness and are well-equipped to measure its impact.”
- Make the most of a data preparation tool. Testing and validating a predictive model often involves altering its analytic data set (ADS)—a single table in the database where one line corresponds to a single customer, and each column corresponds to a variable describing a particular aspect of that person’s recent behavior (e.g., average balance or largest transaction amount). Although altering an ADS is difficult, a tool such as the Teradata ADS Generator can serve as a straightforward means for evaluating a model’s parameters. That’s because this unique data preparation tool provides a graphical user interface and wizards to simplify the ADS-building process.
Included in many of the ADS-building features are wizards that facilitate and automate common tasks such as calculating new predictive variables from different data combinations, profiling data elements to identify patterns and outliers, transforming data, and integrating variables and tables into a consolidated ADS.
Test for success
In the quest to increase revenue, reduce customer churn and identify market trends in the global economy, many organizations recognize the ability of predictive analytics to uncover patterns and yield substantial return on investment (ROI). However, those that overlook testing and validation of their predictive models are missing a key step that injects timely and accurate business objectives as well as real-world market conditions for better results—because there’s never a right time to be wrong about the future.
Cindy Waxer is a Toronto-based freelance journalist.