Connections
Services
Tested under fire
Teradata support center proves it’s truly prepared in times of crisis.
by Imad Birouty
A data warehouse is a vital component of a company’s daily operations—a crucial resource upon which hundreds of applications, thousands of users and myriad business functions depend. And it’s often tied to revenue generation, cost optimization and customer service. Given this important role, companies require 24x7 critical support from their vendor to ensure the data warehouse is always available to the business.
But can a data warehouse vendor actually deliver on its support claims and be there for your business through thick and thin? Can the vendor prove it?
Teradata can.
When San Diego County was ravaged by multiple fires in 2007, the city of Rancho Bernardo, Calif., home of Teradata Research and Development and the Global Support Center (GSC), was shut down. Police barricades, detours and yellow tape surrounded the city. A million people were evacuated throughout the county, and businesses were forced to close. But the GSC continued functioning without a hitch. Its fault-tolerant design allowed every incident to be addressed normally, with no adverse effect felt by customers.

Click to enlarge
Support infrastructure
The GSC is part of Teradata Customer Services, a worldwide organization that includes local field support and regional support centers. This structure combines the flexibility and rapid adaptability of a decentralized organization with the unified command and control associated with a centralized organization. The structure is also an important part of the fault-tolerant nature of the GSC, enabling continuous customer support from multiple locations.
All priority 1 (P1) incidents requiring deep analysis or engineering-level support are sent to the GSC. Sophisticated phone routing systems ensure that incoming calls are directed to the right technical expert. Specialized teams triage the incident to find the root cause and return systems to normal operation as quickly as possible, helping companies maintain their service level goals. Depending on the problem, the resolution might be accomplished completely by remote access or it might require coordination between the GSC and local field support engineers.
The tightly aligned set of organizations and processes enables Teradata to offer customers its highest level of services. Called business-critical support, this 24x7 capability is a combination of on-site and remote support.
Engaging the business continuity plan
Teradata maintains a documented and tested business continuity plan that has been in place for several decades. The plan has been continuously updated and improved through annual testing and lessons learned. It encompasses a robust variety of support by trained associates and enabling technology, varying from telephone routing systems to off-site backups to hot backup sites.
As the risk of fire entering the Rancho Bernardo area increased, so did the readiness of the Teradata disaster recovery team around the globe. Team members initiated status e-mails every three hours to all personnel involved. Additionally, they established telephone status meetings three times a day to ensure everyone was in agreement on the proper actions to take as well as how and when they would be taken. Because the GSC supports incidents internationally, the entire roster of regional support centers took part in the status meetings and contingency plans.
The business continuity plan, which details all of the support systems and specifically calls out those that underpin the GSC’s business-critical applications, was engaged on a contingency basis. GSC representatives, working together with regional support centers and the IT infrastructure group, reviewed the list of business-critical systems and applications, what the disaster recovery plan was for each, what the expected impact/recovery time was, and any actions that needed to be taken. They prepared every system for failover to alternate sites in case the primary systems were lost.
Additionally, contingency plans were engaged for loss of the primary Rancho Bernardo phone system. An alternate system using IP technology was brought online. Configuration was completed for call routing and escalation to the right support person/group and was successfully tested. Primary and secondary IP addresses were distributed along with instructions to the total list of associates.
When mandatory evacuation was ordered for Rancho Bernardo and the Teradata facility was closed, GSC associates worked around the clock in their usual shifts from home and alternate locations. At the same time, a list of planned change controls was distributed to the customer service teams, so all such work could progress as scheduled. For the days when fire danger was highest, around 100 change controls were invoked with no delays or disturbances.
Every P1—unplanned incident or problem—that was escalated to the GSC during the fire was handled as usual with no impact to customers. The P1s varied from a slow-running query that could be improved with additional statistics collection to hung nodes that required a scan disk run and the memory dump uploaded to the GSC server for analysis to determine root cause.
After the fires passed, the Teradata facility, including all of the servers, applications, phone systems and supporting infrastructure, was unharmed. Everything returned to normal after a few days when air quality improved, and it was deemed safe for associates to return to the building.
Continuous improvement
Situations like this enable an organization to put its business continuity preparedness to the test. They also provide an opportunity for learning and process refinement through real-world experience. In the days and weeks after the fires, a full closed-loop review was implemented to identify lessons learned and areas for improvement. Teradata Customer Services management fully committed to and implemented the changes that resulted from this review.
The most important finding was the need to recognize and prepare for a partial disaster as well as a total disaster. Some of the systems in the business continuity plan were reconfigured with failover capability to a secondary site with uninterrupted services as opposed to a restore-from-tape strategy. This plan provides for high availability and not just disaster recovery.
These and other gaps were identified and subsequently closed. Cross-functional team reviews were held until each of the detected gaps was understood and solutions were determined and then executed. Now the business continuity plan is even more robust than its predecessor because of the necessity to invoke it in a real-life crisis.
Proof positive
Today’s data warehouse systems are an integral part of a company’s daily operations. So before you do business with a vendor for a solution that is or will become so important, make sure the vendor is prepared to prove it can deliver the business-critical support you require.
Imad Birouty is program marketing manager for Teradata’s high-availability solutions and data mart consolidation program. Kirk Balingit, Tony Dodson, Daryl Gionnette and Brent Williams—GSC team members who were key participants in the wildfire response—contributed to this article.