The New Data Blending Mandate
An axiom of big data say that, as data volume and complexity grows, it becomes harder for organizations to extract meaning from the data. One solution to this dilemma that is gaining momentum is data blending, which provides a new real-time, analytics-oriented twist on the batch-oriented ETL data integration processes of old.
Data blending is a concept that’s gaining steam among analytic tool providers because it provides a relatively quick and straightforward way to extract value from multiple data sources and find patterns between them without the time and expense of traditional data warehouse processes. Data blending capability isn’t a cure-all for all big data challenges, and not all types of data are amenable to being blended with others. But there are certain big data use cases that are emerging as clear candidates where data blending makes a lot of sense.
For example, companies in the consumer processed goods (CPG) industry are exploring ways they can do real-time blending of customer sentiment data flowing on social media with existing demand forecasts. According to Alteryx president and COO George Mathew, the CPG companies are looking to use data blending to get information that just isn’t available through traditional point of sale (POS) sources.
“When a CPG launches a new product into the market and uses a relevant hashtag, they can get a very good sense of A. What’s the volumetric around a product launch; B. What’s the positive or negative sentiment; and C. How does that marry to my demand forecast?” Mathew tells Datanami. “You want to be able to go in and get a strong picture across what’s occurring in social media, particularly in Facebook and Twitter and bring that into your analytic environment.”
With today’s release of Analytics 9.0, Alteryx announced that several new data sources can now be a part of the data blending capabilities that the company has long offered in its flagship analytic product. That includes support for the social media data feeds from DataSift and GNIP (which was just acquired by Twitter); the contents of analytic databases from Pivotal, HP Vertica, and Amazon Red Shift; and data that customers may have stored in Marketo and Google Analtyics.
Just as the best chefs start out following recipes, Alteryx helps budding data analysts blend their data using pre-built components that are available in the product gallery. Once the analysts start playing around with the different ways that data can be blended, the sky is the limit.
“These sources need to be much more in the hands of the users who are trying to understand, how do I take multiple internal and external sources and pull them together into a cohesive analytic data set that I can model off of?” Mathew says. “We know that having separate tools for separate functions to do that is basically a latency and delay in how the work gets accomplished and done in organization, so a lot of Alteryx’s focus and intent is to really bring those two disciplines closer together–data blending and advanced analytics.”
Another analytics tool provider firmly in the data blending camp is Pentaho, which added “at the source” data blending capabilities in last fall’s launch of Business Analytics 5.0. Pentaho takes the view that data blending ought to be done early and often in the analytics workflow, because it will simply take too long to merge the pertinent data sets after the fact using traditional ETL tools and data warehouses, and any opportunity to act on the insights from blended data will have passed.
“I absolutely believe that the blending of the data [should occur] as far down at the source as possible,” Pentaho’s new chief product officer Chris Dziekan tells Datanami. “Do I have to wait for the IT person to do everything, or can I move the skill set up a bit to the data developer or the data analyst? I really do believe we should be focusing on that secret layer in the middle, blending the data effectively, providing on the glass the experience that allows you to analyze information and see it in big form.”
Other vendors are following suit in big data blending, including Tableau Software, which supports some data blending in its visual data discovery tool. For example, in version 6, the vendor delivered the capability to have two data sources automatically blended when the software detects a common field among two or more data sources. Tableau is considered the top business intelligence product today, so it will be interesting to see if it evolves its data blending capabilities in the future, such as supporting more less-structured data.
MicroStrategy also data blending to its repertoire. With the launch of Analytics Enterprise 9.4 in late 2013, the company introduced the capability to combine data from multiple sources–such as sales data from Excel and demographic data from the US Census Bureau–without requiring a separate integration tool and without involving IT. “We believe that we have transformed the self-service analytics market with our new on-the-fly data blending technology,” says MicroStrategy president Paul Zolfaghari in a press release.
Jaspersoft, meanwhile, is also jumping on the data blending train. The San Francisco company’s February release included the capability to blend data from NoSQL and relational in a GUI environment. “Companies are harnessing data stored in big data stores like Hadoop and MongoDB to provide immediate insights in the context of a business process where it’s needed most,” Jaspersoft vice president Karl Van den Bergh says in a statement.
As the number and type of data sources continues to grow, so too will the desire of business users to get access and make sense of that data. This is a trend that has been going on for years, but unlike in the past, companies today increasingly do not have the luxury of time to digest and sort data. Companies that adopt tools that allow their users to visually mash their data will have a competitive edge when it comes to reacting to big data flows.