Language Flags

Translation Disclaimer

HPCwire HPC in the Cloud Digital Manufacturing Report Green Computing Report
Rogue Wave

October 17, 2012

Lavastorm Enlivens Platform with R


The era of big data has meant a resurgence of interest in the R statistical approach to analytics. Accordingly, a number of startups have aimed at increasing usability and functionality while still other companies—often established players in the analytics market--have sought to integrate statistical capabilities into their existing platforms.

Among the latter group is Lavastorm, which just announced it is providing an interface between its own analytics platform and the R statistical language, which allows users to boost their processes with the power of the sturdy open source stats language without the need to implement an internal R server.

As noted, this is not the first company to explore the options around commercializing a language that was rooted in academia for much of its development cycle. The last couple of years have revealed that companies like Revolution Analytics, for instance, are able to sustain a growing business on the promise of extending R’s capabilities beyond the traditional purview.

Whereas Revolution, for instance, has implemented performance and scalability improvements to core R through technology enhancements to the platform itself, Lavastorm is doing something quite different by offering a complementary solution that they say provides better tools for working with the data before R analyses are applied.

The company’s CTO, Rich Boccuzzi, described Lavastorm's foray into R as a means to tap into new stats possibilities, claiming that users of its existing platform can now execute R scripts using data housed in the Lavastorm Analytics Platform and receive results back into the platform for further integration and analysis.  Boccuzzi  is no stranger to the company’s analytics platform and engine, having been involved with the development process behind it since joining Lavastorm in 1999.

Boccuzzi believes that all the attention around big data is drawing more attention to analytics and business intelligence options, but also for more established approaches to analysis like trusty old R. As he told us, “One interesting facet of the big data explosion has been the expansion in the set of people who must become data analysts. This requirement isn’t usually satisfied cheaply or easily with software, and it often requires a technical skill set which traditional analysts may not have or easily obtain.”

Just as R itself has been around for ages, Lavastorm goes way back with big data analytics and business intelligence. With MIT research roots that blossomed into business shoots in 1993, the company found a home in the telecom industry in particular with rather basic database-driven operational tools. As the needs of businesses grew more complex, the company expanded into new markets with the advent of its Analytics Platform and Analytics Engine, which aim to merge clean and define diverse data types for analysis. They claim that their approach is robust enough for data-intensive projects like fraud detection, optimization and healthcare analytics with the ability to analyze 3 billion records per day across multiple analytic processes—although as we might imagine, it’s difficult to rely on such general numbers for such diverse data and application types.

He says the real business value of R for this group of users is that it “provides a great entry point for self-sufficient analytical work, since it doesn’t incur large infrastructure costs and it offers so much functionality off the (community-driven) shelf.  When you put this much power in more people’s hands, you see opportunities for application of sophisticated analytics where it would have been cost-prohibitive before.”

As Boccuzzi explained, the problems Lavastorm is trying to solve with its R tie-in trying to solve are revolve around the challenges of assembling data from across enterprise silos and federating them into a comprehensive and trustworthy foundation for R analyses. He says that to make this more seamless, the Lavastorm platform provides a visual front end for designing data acquisition, federation, and analysis applications and a powerful and scalable back end for processing these applications. 

“We see the Desktop flavor of our offering (which is available in a free Public Edition) as a great way for individuals to manage the data they’ll use with R, and also to enhance their R-based analytics by leveraging our set of components to ensure the integrity of the data,” he explained.  Boccuzzi noted that the visual, component-based nature of the company’s software allows R users to create drag-and-drop components which contain complex data manipulations and R analyses but which can be packaged for use by analysts who may not themselves be proficient in R.  For example, a user may combine a sequence of data filters and joins which prepare the data with a linear regression R script into a single node which a user may execute without knowing exactly how these operations were performed.  This allows organizations to make the R-based statistical analyses available and useful to a wider audience.  We believe the combination of our software with R represents a major enhancement to the way R is being used, and can broaden the set of users who can leverage R’s power.

In terms of pushing a business model beyond its bread and butter platform users, the Lavastorm CTO explains that the company is simply trying to provide users of R with a better end-to-end approach from data access through results presentation.  “You can use free R with the free version of Lavastorm software and be quite productive.  We see the commercial potential for Lavastorm Analytics’ R integration in the desire to work with larger data sets and to do more with the data before and after they are treated with R.  So while our R integration is now and will remain free, we believe that exposing users to our technology will provide a compelling offering for which they will pay a premium.”

While it may not address some of the ultra high-end needs of data and compute-intensive tasks as firmly as some solutions that leverage Hadoop, in-memory platforms or vast streams of complex data, for its purposes, R could potentially boost its use for mid-sized operations on the data mining and optimization as well as for building models for specific tasks.

Share Options


Subscribe

» Subscribe to our weekly e-newsletter


Discussion

There are 0 discussion items posted.

 
Cray CS300-LC

Sponsored Links

Sponsored Whitepapers

Parallel Performance of the IMSL C Numerical Library with OpenMP

05/21/2013 | Rogue Wave Software

Download whitepaper containing benchmark results depicting the speedup achieved as a result of incorporating OpenMP directives in the IMSL C Numerical Library, for portable, cross platform analytics.

Download this Whitepaper...

Best Practices in Big Data Storage - Sponsored by Cleversafe, Cray, DDN, NetApp, & Panasas

05/10/2013 | Cleversafe, Cray, DDN, NetApp, & Panasas

From Wall Street to Hollywood, drug discovery to homeland security, companies and organizations of all sizes and stripes are coming face to face with the challenges – and opportunities – afforded by Big Data. Before anyone can utilize these extraordinary data repositories, however, they must first harness and manage their data stores, and do so utilizing technologies that underscore affordability, security, and scalability.

Download this Whitepaper...

View the White Paper Library

Sponsored Multimedia

SGI President and CEO, Jorge Titinger, on Big Data

SGI President and CEO, Jorge Titinger, talks about SGI's history and leadership in HPC and how that has converged into Big Data Solutions.

View Multimedia

Cray CS300-AC Cluster Supercomputer Air Cooling Technology Video

The Cray CS300-AC cluster supercomputer offers energy efficient, air-cooled design based on modular, industry-standard platforms featuring the latest processor and network technologies and a wide range of datacenter cooling requirements.

View Multimedia

More Multimedia



Job Bank

Datanami Conferences Ad

Featured Events

June 4-4, 2013
The Economist's Information Forum
San Francisco, CA
United States

June 10-13, 2013
Cloud & Big Data Expo
New York City, NY
United States

June 17-18, 2013
Forecast 2013
San Francisco, CA
United States

June 19-20, 2013
GigaOM Structure
San Francisco, CA
United States

June 26-27, 2013
2013 Hadoop Summit
San Jose, CA
United States

June 26-27, 2013
Big Data World Congress
London
United Kingdom

» View/Search Events

» Post an Event