A Storyboard Approach to Big Data Insights
Big data by itself is just a worthless collection of numbers and characters. To make big data work, you need to show how the information is meaningful. Taking a storytelling approach to analytics is one way to put big data in context.
One of the analytics vendors that’s well-versed in telling data-driven stories is ClearStory Data, which develops a Hadoop-based application that lets users explore big-data feeds, find relevant insights, and share them with others. ClearStory is one of the first Hadoop vendors to use the in-memory Apache Spark framework, which drives the harmonization of big, fast-moving data feeds.
At this week’s Strata + Hadoop World conference, the Menlo Park, California company unveiled a new way that customers can interact with the application. The new Storyboards feature is designed to make it easier to share the insights gleaned from ClearStory’s Spark-powered data harmonization engine.
Instead of sharing just a single big-data storyline in a single dashboard, the new Storyboards mode enables groups of users to share multiple storylines in an interactive manner. Not only can users see how different big data sets correlate and compare, they can watch as the story unfolds, similar to scenes unfolding in a movie, says ClearStory Data founder and CEO Sharmila Mulligan.
“You think of every view as a frame or a scene,” she tells Datanami. “You’re looking at scenes in a movie, and it’s giving you the end to end story on what’s happening in the situation… Then people who are looking at it can click on it and start question what’s being described and shown, and iterate on it to make sure that everybody’s concluding the right thing.”
Previously, ClearStory supported the telling of single-threaded storylines. The Spark in-memory layer could harmonize and join up to 24 separate data feeds, and ensure that the data was continuously refreshed and up-to-date. However, users were limited in what they could share. With StoryBoards, it’s now multi-threaded.
“It’s one thing when you’re looking at a single story with that much data, which is what we did before, collaboratively, interactively, in real time,” Mulligan says. “Now we’re taking this many live views and pulling it all into a single interactive storyboard that can continue to expand their story line and be explorable.”
The idea is to democratize access to big data. In earlier versions of the product, data stewards would act as the gatekeepers to the product and decide which data sets to expose to ClearStory. Then there were authors, who created the storylines and dashboards that business users would consume. As the third mode in ClearStory, StoryBoards is designed for a bigger audience of users who need access to powerful analytics.
“It’s no longer up to a single subject matter expert to be guiding or directing the analysis or what data sources are involved,” says Scott Anderson, sales engineer at ClearStory Data. “Any end user that has the data steward persona can easily blend in new data sources, hit upload off a flat file off desktop, and throw that up against any data pulling out of your Hadoop cluster or your backend relational warehouse environment.”
StoryBoards will be an iterative update for existing ClearStory customers, who have been using the software to gain greater insight into their own customers’ behaviors and sentiment by exploiting big internal and external data feeds coming from point of sale (POS) systems, social media streams, mobile devices, medical records, and the U.S. Census, not to mention curated feeds from Dun & Bradstreet and Nielsen. The combination of Spark and ClearStory’s algorithms did the hard bit of ensuring the sizes and granularities of different data streams are normalized and ready to be analyzed, while the front-end visualization tools helped the user translate insights into easy-to-consume graphs.
But for new customers who are accustomed to sharing static reports and dashboards via email, the new StoryBoards feature will likely represent a major change. “Dashboards as we know them are dead,” Mulligan says. “Every single scene or frame [in StoryBoard mode] contains more data that you can ever have in a dashboard. The volume and variety in each frame is far more than what you can have in a dashboard, and it’s live.”
For ClearStory, none of this would be possible without the power of Spark, which does things that ClearStory’s architects would have struggled to achieve in MapReduce. “Spark is making it’s name in the development community as the actual promise that Hadoop was supposed to provide years and years ago,” Anderson says. “And being the first commercialized solution built on Spark, a lot of what we bring to the table is figuring out how to serve Spark at scale in a multi-tenant environment across thousands and thousands of users.”
The big data story unfolds, it’s clear that organizations will need increasingly powerful tools to make sense of increasingly bigger data sets. Tools like ClearStory’s that help democratize access to data will be well positioned to succeed.