Follow Datanami:
January 16, 2015

Pulling Insights from Unstructured Data – Nine Key Steps

Salil Godika

Data, data everywhere, but not a drop to use. Companies are increasingly confronted with floods of data, including “unstructured data” which is information from within email messages, social posts, phone calls, and other sources that isn’t easily put into a traditional column. Making sense and actionable recommendations from structured data is difficult, and doing so from unstructured data is even harder.

Despite the challenge, the benefits can be substantial. Companies that commit to examining unstructured data that comes from devices and other sources should be able to find hidden correlations and surprising insights. It promotes trend discovery and opens opportunities in ways that traditionally-structured data cannot.

Analyzing unstructured data can be best accomplished by following these nine steps:

1. Gather the data

Unstructured data means there are multiple unrelated sources. You need to find the information that needs to be analyzed and pull it together. Make sure the data is relevant so that you can ultimately build correlations.

2. Find a method

You need a method in place to analyze the data and have at least a broad idea of what should be the end result. Are you looking for a sales trend, a more traditional metric, or overall customer sentiment? Create a plan for finding a result and what will be done with the information going forward.

3. Get the right stack

The raw data you pull will likely come from many sources, but the results have to be put into a tech stack or cloud storage in order for them to be operationally useful. Consider the final requirements that you want to achieve and then judge the best stack. Some basic requirements are real-time access and high availability. If you’re running an ecommerce firm, then you want real-time capabilities and also want to be sure you can manage social media on the fly based on trend data.

4. Put the data in a lakeunstructured data ball

Organizations that want to keep information will typically scrub it and then store it in a data warehouse. This is a clean way to manage data, but in the age of Big Data it removes the chance to find surprising results. The newer technique is to let the data swim in a “data lake” in its native form. If a department wants to perform some analysis, they simply dip into the lake and pull the data. But the original content remains in the lake so future investigations can find correlations and new results.

5. Prep for storage

To make the data useful (while keeping the original in the lake), it is wise to clean it up. For example text files can contain a lot of noise, symbols, or whitespace that should be removed. Dupes and missing values should also be detected so analysis will be more efficient.

6. Find the useful information amongst the clutter

Semantic analysis and natural language processing techniques can be used to pull various phrases as well as the relationship to that phrase. For example “location” can be searched and categorized from speech in order to establish a caller’s location.

7. Build relationships

This step takes time, but it’s where the actionable insights lay. By establishing relationships between the various sources, you can build a more structured database which will have more layers and complexity (in a good way) then a traditional single-source database.

8. Employing statistical modeling

Segmenting and classifying the data comes next. Use tools such as K-means, Naïve Bayes, and Support Vector Machine algorithms to do the heavy lifting to find correlations. You can use sentiment analysis to gauge customer’s moods over time and how they are influenced by product offerings, new customer service channels, and other business changes. Temporal modeling can be applied to social media and forums to find the most relevant topics that are being discussed by your customers. This is valuable information for social media managers who want the brand to stay relevant.

9. End results matter

The end result of all this work has to be condensed down to a simplified presentation. Ideally, the information can be viewed on a tablet or phone and helps the recipient make smart real-time decisions. They won’t see the prior eight steps of work, but the payoff should be in the accuracy and depth of the data recommendations.

Every company’s management is pushing the importance of social media and customer service as the main drivers of company success. However, these services can provide another layer of assistance to firms after diagnostic tools are applied to their underlying data. IT staff need to develop certain skills in order to properly collect, store, and analyze unstructured data in order to compare it with structured data to see the company and its users in a whole new way.salil_godika

About the author: Salil Godika is Co-Founder, Chief Strategy & Marketing Officer and Industry Group Head at Happiest Minds Technologies. Salil has 18 years of experience in the IT industry across global product and services companies. Prior to Happiest Minds, Salil was with MindTree for 4 years as the Chief Strategy Officer. Before MindTree, Salil spent 12 years in the United States working for start-ups and large technology product companies like Dassault Systems, EMC and i2 Technologies. His accomplishments include incubating a new product to $30million in revenue, successful market positioning of multiple products, global marketing for a $300million business and multiple M&As.

 Related Items:

9 Must-Have Skills to Land Top Big Data Jobs in 2015

The Application Angle to Unstructured Data

A Gateway to Unstructured Data