Follow Datanami:
February 11, 2014

Alpine Debuts ‘Chorus’ Line for Big Data

Alex Woodie

Greenplum spin-off Alpine Data Labs is best known as a developer of proprietary algorithms that do the hard work of crunching vast data sets in Hadoop and other big data platforms. With today’s launch of a new product called Chorus, Alpine is borrowing ideas from the social media world for the purpose of bringing together all the big data stakeholders in an organization.

Data science is largely an individual sport, with many silos of skill that aren’t easily shared. Data scientists do the hard work of synthesizing data, writing algorithms, and creating models, while data analysts are tasked with keeping the good data flowing for the business users.

Each of these groups have different skill levels and use different products to accomplish their tasks. The data scientist may tinker with the MapReduce jobs, while the analysts ensure that the results of those jobs are viable for power users to consume in Tableau. There are tools that crossover – Excel, for instance, is ubiquitous–but no single overarching platform that brings it all together.

That is Alpine’s plan with Chorus: To build a single, overarching platform that brings together multiple data analytic products and their users. And it does so using the Web 2.0 metaphors of Google (for search) and Facebook (for sharing).

“We brought the hardcore predictive analytics capabilities of Alpine together with a Web-based team approach,” Alpine chief product officer Steve Hillion says. “It’s the core data science capability of Alpine wrapped up in team based collaborative manner. It really helps the whole team work together on the analytics workflow.”

From the Chorus Web portal, users can view the data sets available to them, whether they’re Hadoop clusters, massively parallel processing (MPP) platforms like Greenplum, or a traditional Oracle database. They can search these data sets, pull the interesting pieces and visualize the data through plug-ins for third party products, such as Tableau. As the users work with the data, they can add annotations and comments, which are tracked by the Chorus software. They can even work with Hadoop jobs and initiate them as needed.

“With Chorus I can not only find and annotate those data sets and start asking questions about them, but I can right click on them and kick off an advanced analytics modeling session,” Hillion says. “It is one-stop shopping for everything from exploring data through to building models to pushing models into production.”

The Alpine Data Labs team demonstrated the new system to Datanami last week, ahead of today’s unveiling at the Strata show taking place this week in Santa Clara, California. A common use case for big data is reducing customer churn, and Chorus could accelerate this process.

Effectively isolating the churn signal in a big data set is not an easy task, and requires close examination of data, ostensibly by a data scientist or an experienced data analyst. In the demo, Chorus was used to show others in a group that the churn signal had already been identified, which allowed the next step to be taken. “This is really about getting the team together, not just to talk about it, but to act on it,” says chief marketing officer Bruno Aziza.

The company built Chorus in the open source realm, and is counting on partners to build adapters for accessing big data analytic products–from ETL tools to visualization and reporting tools–within the Chorus product. Visualization vendors QlikTech and Tableau area already represented, as are any JDBC-compliant data sources. The company has talked with the folks at data cleansing startup Trifacta, and one of its partners is currently working on an R plug-in.

The idea is to make big data analytic projects less difficult to get started and easier to keep going. Calling it “Facebook for big data analytics” may be an oversimplification, Hillion says, but it’s spot-on in terms of the collaborative, team-oriented way that Alpine thinks analytics should be conducted.

Related Items:

Shining a Light on Hadoop’s ‘Black Box’ Runtime

Alpine Demos Big Data Analytics from an iPad

Alpine Data Climbs Analytics Mountain

Datanami