A Call to Action



The Rise of the Data Scientist

A new breed of database expert bridges business objectives and big data analytics.

If current forecasts prove correct, some 50 billion devices—from cars to household appliances to phones—will be generating data and silently communicating with each other by the end of this decade. Being able to analyze this data tsunami will be critical for organizations that rely on information for insights and decision making.

McKinsey Global Institute, the research branch of global management consulting firm McKinsey & Company, has identified big data analytics as an activity that’s becoming vital for business competitiveness and growth in its report “Big data: The next frontier for innovation, competition, and productivity.”

Simplify the Job

In any software market, the evolution of technology drives better usability and minimizes the staff requirements to support business processes. The same goes in the world of big data analytics.

Analytical innovations, such as the Teradata Aster MapReduce Platform, can drastically simplify the role and skills required by data scientists. The platform brings together SQL with MapReduce to provide SQL-MapReduce, which can run complex analytics on both structured and multi-structured data to produce business insights from new and traditional data sources.

The ultimate goal of SQL-MapReduce is to minimize the knowledge required by data scientists to create new analytical apps. They are no longer required to understand the programming to develop new functions or determine the effect that distributed processing can have on application deployments. This reduces the skill set needed for SQL and business processes, which in turn opens the door for data mining and business intelligence analysts to rapidly transition into data scientists.

A new type of specialist is emerging to help enterprises use big data analytics to achieve specific business objectives: the data scientist. This expert—skilled at analyzing information collected from almost any source—possesses the ability to uncover less-obvious intelligence locked inside masses of impenetrable data. McKinsey believes that data scientists can pay big dividends for companies that use them. Retailers, for example, can increase operating margins by up to 60% simply by having these experts analyze large data sets to their fullest. The U.S. health care industry, meanwhile, stands to capture more than $300 billion annually in new value with their help, McKinsey finds.

Diego Klabjan, associate professor of industrial engineering and management sciences at Northwestern University, explains that a “data scientist is able to derive unique and unexpected values from data or propose changes in business processes supported by the data.” D.J. Patil, data scientist-in-residence at Silicon Valley venture capital firm Greylock Partners, who coined the term “data scientist” while working at LinkedIn, has an even simpler definition: “They are people to make data come alive.”

Business-Side Expertise

The insights captured by data scientists can fundamentally change how an organization does business. Klabjan notes just a few of the ways:

  • Provide business value to day-to-day operational decision making
  • Support tactical and strategic decisions and plans
  • Be at the forefront of customer relations
  • Support marketing and sales
  • Shape long-term business strategies with predictive modeling, not just mining historical data

To achieve these benefits, data scientists must have the ability to apply sophisticated data models and solutions to real-world business situations. This work can’t be accomplished without advanced business knowledge. “The expertise of a successful data scientist stretches beyond just analyzing the data into the space of data warehousing and management,” Klabjan states. “A successful analytics-based solution cannot be derived without knowledge of business operations and direct interactions with key operations workforce personnel.”

By combining advanced data insight with business savvy, a data scientist can have a powerful, positive effect on a company’s bottom line. “In every instance, I’ve seen data scientists create disproportionate value for the business,” says Patil.

Have What It Takes?

Just about any enterprise that accumulates large amounts of information about customers, consumers, prospects, competitors and others can benefit from having one or more data scientists on staff. They can work with both the business and IT sides of an organization to generate intelligence. “Very few projects are a one-man show, and thus a data scientist must be able to blend with the entire project team or lead a team,” according to Klabjan.

Yet many organizations are still unfamiliar with the concept. A data scientist’s basic task is to create insights that will help a company build revenue and enhance its competitive position. To generate such wisdom, the experts must have these attributes:

  • Statistical, data mining and machine learning skills
  • Advanced programming skills
  • An intimate knowledge of data warehousing and data management, including business intelligence (BI)
  • The ability to communicate well and “speak the business language”
  • Willingness to be a team player
  • Expertise in finding and accessing rich data sources
  • Acumen at working with distributed systems and large volumes of data independent of hardware, software and bandwidth constraints
  • Knowledge of how to solve problems by melding multiple data sets together
  • Ability to visualize data

Where Can We Find One?

Patil advises businesses searching for data scientists to target candidates who have “intense curiosity, a passion for ‘playing in the data’ and a history of having to manipulate large volumes of data to solve a problem.”

Yet finding qualified individuals can be a struggle. “‘Data scientist’ is one of the most in-demand titles out there,” Patil observes. That demand is likely to increase for at least the next few years as more organizations recognize the need for big data analytics experts. Indeed, McKinsey estimates a shortfall of 140,000 to 190,000 people with analytical expertise by 2018. However, to fill this growing demand organizations can look internally to identify resources that have been filling a data scientist role, but without that title. Most large organizations have such people within their existing advanced analytics, predictive modeling or data mining teams.

The current data scientist shortage is rooted in the fact that only a relative handful of people are able to excel at both advanced IT and business concepts. “While the vast majority of data scientists possess the technical, scientific and IT skills, it is much harder to find those who also have strong business knowledge,” Klabjan points out. “Some of these attributes are hard to be taught, since they can mostly be acquired with experience.”

Plan Ahead

Today, data scientists help businesses leverage massive amounts of data to drive better tactical and strategic decisions in marketing, sales, customer relations and other crucial areas while also helping their employers develop long-term business plans. In the future, data scientists will pay even closer attention to data generated by buyers and sellers in increasingly competitive online markets. “I think we are only scratching the surface when it comes to the value of big data,” Klabjan asserts. “After all, humans are the most intriguing entities to model and predict.”

Patil agrees. “As more businesses realize that they are critically dependent on data to navigate the competitive landscape, it is only natural that data scientists will play an increasingly prominent role in every organization,” he explains.

Your Comment:
Your Rating:

I have seen some sites claiming that DJ Patil at LinkedIn (As in your site) and Jeff Hammerbacker coined the term "Data Scientist" in 2009. This is probably wrong, since the information provided by national science foundation of USA mentions about the term "Data Scientist" way back in their September 2005 article. Please see the link: http://www.nsf.gov/pubs/2005/nsb0540/ or http://www.nsf.gov/pubs/2005/nsb0540/nsb0540.pdf

3/7/2013 8:13:39 AM
— Anonymous
Fuzzy Logix