Follow Datanami:
September 13, 2013

Pentaho Goes All In with Big Data Blending

Alex Woodie

The advent of NoSQL and Hadoop data stores is giving organizations powerful tools to collect vast amounts of data. However, driving actionable insights out of that data–and doing so in a reasonable timeframe–is easier said than done. Now, data analytics firm Pentaho thinks it has a solution for that dilemma with its new “blend at the source,” near real-time data integration capability for big data stores.

The new data blending function unveiled this week is designed to get around some of the challenges that have prevented analysts from making the fullest and best use of all their data–not just the Hadoop- and NoSQL-based data, but the relational data that organizations still collect and analyze, too.

In the traditional world of business intelligence, ETL tools would move all data to be analyzed into a centralized data warehouse, where it’s washed, straightened, primped, curled, tinted, and otherwise made presentable to the business analysts tasked with driving the tools that find the nuggets of truly useful and actionable information.

This model worked pretty well when all of the data was structured and in a relational format. But it completely falls apart when you introduce big, unstructured, and highly variable data into the equation. It’s simply not feasible to move all of this big data into the traditional warehouse for centralized reporting. In fact, that’s why many organizations bought Hadoop and NoSQL in the first place–because their data didn’t fit into their regular data warehouses!

But this new distributed approach presents a real roadblock when one wants to create analytical views that combine the exciting new data with the boring old data. Analysts may have no trouble identifying which pieces of data in NoSQL or Hadoop that they want to see in a view next to data from traditional data stores. But the actual process of integrating these data stores can create problems.

Pentaho says it has addressed this problem with the new “blend at the source” data integration capability introduced in Pentaho Business Analytics 5.0. This function enables analytical views and dashboards created with the Pentaho tool to pull data from big data stores and blend it with other data sources on an as-needed basis.

This function will appeal to organizations that place a heavy emphasis on customer retention, according to Donna Prlich, Pentaho’s Director of Big Data Products and Marketing.

“The pattern that we’ve seen with our customers is that they’re trying to look more closely at their customer and figure out how to retain them customers or to get more revenue,” Prlich says in an interview with Datanami. “What they’re trying to do is take some of that data that’s coming in and then blend it with data that might traditionally be in a data warehouse.”

Telecommunications and e-commerce are two of the top use cases for this type of blending. In Pentaho’s telecommunications example, data is being mixed from two disparate sources: the traditional data warehouse, which stores structured data collected from transaction and customer care system, and the new MongoDB data store set up to collect stream data about dropped calls, outages, and call quality.

In the old days, a CSR at the telecommunications firm would not have had immediate access to a dashboard that shows Customer X always paid his bill on time, but that his mobile phone calls have been dropping at a high rate. And in the old days, after Customer X calls to complain about the poor service, the CSR would have been powerless to do anything about the customer’s threats to defect to the phone company’s archrival. With real-time data blending in place, the CSR would be empowered to offer the customer a discount as recognition of the dropped calls.

This new in-place blending–which Pentaho claims is a unique offering not currently available from other data analytic software vendors–can also be used in an e-commerce setting. Perhaps a customer is on the phone with an online retailer to discuss a recent order. “Maybe they’re able to pull in some weblog data [that shows the retailer] what they most recently were looking at online,” Prlich says. “You could say, ‘Hey I noticed you were looking at these blue shoes the other day,’ and offer them a special promotion on the spot.”

Of course, an organization has always been free to blend any data–even really big data–to their heart’s content. What makes Pentaho’s “at the source” blending notable comes down to semantics, according to Chuck Yarbrough, a member of Pentaho’s product marketing team.

 ”We’re rapidly moving into an era of distributed data analytic architectures where data should remain in its most optimal store,” Yarbrough said in a video. “Blending data at the source maintains the integrity of necessary rules of governance and security of the data. It’s better than end-user blending away from the source, which lacks the underlying data semantics and risks inaccurate results.”

Pentaho historically has been strong on the integration side of things, even if it wasn’t as strong in others, according to research firm Gartner. The data blending features will build on Pentaho’s strength in integration, but it’s not the last integration feature you’ll see out of the firm.

Just don’t ask what the next one will be.

“What we’ve done on the front-end for administrators and end users is built around the concept of, we don’t know what the next type of data will be,” Prlich says. “So when we’re talking about big data and unstructured data–there’s data from devices, there’s Twitter data, there’s Google Analytics data. OK, but what’s next?  You can’t really predict….The benefit of Pentaho is that …we’re able to morph as needed to address whatever’s coming next.”

In addition to big data blending, the company unveiled a major overhaul of the UI, new data connectors for MondDB, and about 100 other features.

Related Items:

SoundCloud Liberates Data with Hadoop, Pentaho

Data Driving the Exit Into Hadoop

The Five Types of Hadoop Data