Follow Datanami:
June 7, 2017

How to Drive Value From Big Data: Collaboration

Mukund Deshpande


Big data can transform enterprise operations, uncovering trends that could be easily overlooked and pointing the way to new opportunities. From detecting a seemingly insignificant mechanical problem that could shut down operations to pinpointing an untapped market, the value big data offers is significant.

Yet contrary to popular belief, instead of making it easier for businesses to extract relevant insight in a timely manner, big data can make it difficult. Data levels are rising, flooded with data sets from disparate systems and in formats that are both structured (Hadoop and NoSQL) and unstructured (Twitter, Facebook, blogs, shop floor readings, to name a few). These terabytes and petabytes of data sets do not automatically translate into the intelligence that various departments within an organization need.

Ultimately, extracting full value from big data is a paradox — both easy and hard. For complex big data to truly be of value, organizations must embrace a model of collaboration. The adage, “It takes a village to raise a child,” applies equally well when an organization collaborates to develop big data to its fullest potential.

Here, we’ll take a look at the inherent challenges of big data and what collaborating to drive value encompasses.

Data First, Schema Later 

In the past, before the data could be ingested, the schema, or the way the data is stored, needed to be decided. This made it difficult to add new data sets. However, today’s big data platforms make it easy to ingest high volumes of data sets — a design choice known in the industry as “data first, schema later.” Essentially, data is taken in quickly, without effort to transcribe or cleanse it. How to make use of the data will be resolved later. The result is a flood of data that the average person cannot easily use.


Nevertheless, the biggest challenge most big data platforms pose is their complexity. It’s difficult for someone who is not a developer and only using the tools available to map data and answer questions, such as why sales are dropping in a specific region.

Due to complexity and volume, extracting value from big data involves a three-step process: cleaning a data set; merging the clean data with yet another data set; and finally, employing additional advanced analytical concepts to map the merged data to the business objectives. Performing these operations calls for multiple tools and possibly writing code – not a simple operation for an average business user.

Collaboration Builds Value

It is virtually impossible today for an individual or department to understand and use the entire gamut of data being collected or its applicability to business outcomes. After all, it’s easy to get carried away with the promise of technology – processing higher and higher volumes of data points than ever before, as quickly as possible.

However for this ongoing stream of raw information to be truly valuable, multiple departments must be able to work in sync with the data and pool their knowledge. There must be a common language and structure so everyone can view, share knowledge gained, and collaborate to the benefit of the enterprise.  What is needed to enable and encourage collaboration? A business must be able to work together using data within three areas:


1) Data set annotation

Data is a starting point for analytics. Big data has also led to an explosion of datasets, not necessarily all of good quality. Users should be able to annotate data sets about quality and relevance so they can quickly and easily be discovered others. Furthermore, as the data sets are used, the system should auto-annotate the datasets with the usage.

2) Transparency in data visualization and analytics

When someone analyzes a data set, a snapshot of the data flow and descriptions of what has been done should be visible to others as well, under proper governance. Typically referred to as lineage, transparency across the data life cycle includes its origins, movements and visualizations.  This transparency ensures that people can easily learn from each other’s work and makes the analytics easy to maintain and enhance.

3) Analytics should be easily reused and shared

A big data system should encourage usage by others within the organization by ensuring that the analytics is written in a reusable format. Starting from scratch and re-creating someone else’s efforts wastes time and resources that could be better directed towards building additional knowledge. A framework that makes it easy to write analytics that can be reused ensures that the same analytics can be used repeatedly.

It is essential that stakeholders throughout the enterprise can extract the knowledge they need from this pool of terabytes and petabytes of data, while making it available for others to use as well. And, as more and more intelligence is added, the data grows more valuable. That is the power of collaboration.

Do you have a collaborative platform which makes it easy for different people in different departments to work together?

About the author: Mukund Deshpande heads the Analytics practice in Accelerite and owns the ShareInsights initiative. He has over 18 years of experience in the field of analytics, working in variety of different roles. He has been involved in several cutting-edge projects and internationally recognized award-winning initiatives. He holds a PhD in Computer Science from the University of Minnesota, where his research was on data mining techniques for sequences and graphs.


Related Items:

Why You Need a Data Science Platform

A Visual IDE for Big Data That Promotes Collaboration

Data Opportunities Slipping Through the Cracks