Embrace the Elephant in
Enrich the wealth of information in your data warehouse with Hadoop data.
How do you get your arms around an elephant? Have long arms.
While the Apache™ Hadoop® mascot hasn’t led to a trend of elephant jokes, there is some confusion about what Hadoop is, what it can do and how to best incorporate its data into existing architectures to gain maximum
Hadoop is a collection of open-source projects, including the Hadoop Distributed File System, that businesses do not have to pay to use. Hadoop enables users to keep massive amounts of all kinds of data in its original format, including video, email, Web logs and other data types that do not fit nicely into rows and columns without pre-processing. This is especially advantageous for unstructured, non-relational forms of big, diverse data since users do not have to decide on a schema before storing the information. All kinds of data can be kept—GPS data from mobile phones, information from machine sensors, Web logs with details of site usage—even if you have no idea at the present time what you might use that data for.
Think of Hadoop as a reservoir for potentially valuable raw materials that you may not yet know how to use and in the past, would have thrown away for lack of storage. Hadoop can store all of that unstructured data cost-effectively. In many respects, it’s liberating. Businesses have access to a depth and detail of data that, prior to Hadoop, may have been lost when converting it into a format that works in a relational database. In many cases, businesses kept three months’ worth of detailed Web log activity, then threw it out. Now they can keep years of activity at a low cost.
For Hadoop to add value to the business, it has to be integrated into a more comprehensive analytics framework that can effectively leverage its contents.
In addition, many customer interaction channels have unstructured data that requires pre-processing before analysis. Examples include call center audio files, emails and notes that field or call center agents collect about customers. The process of capturing, storing and refining this information is more feasible with Hadoop.
New Level of Understanding
By adding a level of detail from the big, diverse data stored in Hadoop to the wealth of information stored in a data warehouse, businesses can understand the habits and preferences of their customers with greater precision than in the past. In other words, you can enrich the models you currently have with additional detailed information.
Let’s say your website is designed for self-service. You’ve established that many customers try to use the website, but end up calling the support line. By extracting information from Hadoop data and adding it to your existing customer data stores, you can craft models of behavior that are much more granular. This gives you insights about where those customers are when they abandon your site and call support in frustration.
Since you’re keeping all the data, if you later want to investigate the impact of a new factor, like the type of browser and device used to access the site, you can get detailed information from Hadoop and enrich your model again. This might encourage you to create a new mobile app to support customers on the device of their choice.
The Best Strategy of All
Hadoop stores data efficiently while making interaction channel information cheaper to capture and refine. However, for Hadoop to add value to the business, it has to be integrated into a more comprehensive analytics framework that can effectively leverage its contents.
Incorporating the data stored in Hadoop into new or existing data models can give us great power, but as usual, that comes with great responsibility. As an open-source project, Hadoop is constantly evolving. For example, Hadoop is still maturing in terms of providing differentiated levels of access to its data stores. You either access all or none of the data. Technologists concerned about data governance, privacy and security have to architect the extraction of Hadoop data intelligently and in concert with governance policies.
Leveraging big, diverse data in Hadoop and combining it with the rich data stores you already have delivers a more granular view of your business than has previously been possible. So go ahead, embrace that elephant and integrate Hadoop data into your analytical framework so you use and benefit from all the data you have available, which is the best strategy of all.
Dan Woods is CTO and founder of CITO Research. He has written or co-authored more than 20 books about business and technology and has a column on Forbes.com.