The era of big data has meant a resurgence of interest in the R statistical approach to analytics. Accordingly, a number of startups have aimed at increasing usability and functionality while still other companies—often established players in the analytics market–have sought to integrate statistical capabilities into their existing platforms.
Among the latter group is Lavastorm, which just announced it is providing an interface between its own analytics platform and the R statistical language, which allows users to boost their processes with the power of the sturdy open source stats language without the need to implement an internal R server.
As noted, this is not the first company to explore the options around commercializing a language that was rooted in academia for much of its development cycle. The last couple of years have revealed that companies like Revolution Analytics, for instance, are able to sustain a growing business on the promise of extending R’s capabilities beyond the traditional purview.
Whereas Revolution, for instance, has implemented performance and scalability improvements to core R through technology enhancements to the platform itself, Lavastorm is doing something quite different by offering a complementary solution that they say provides better tools for working with the data before R analyses are applied.
The company’s CTO, Rich Boccuzzi, described Lavastorm’s foray into R as a means to tap into new stats possibilities, claiming that users of its existing platform can now execute R scripts using data housed in the Lavastorm Analytics Platform and receive results back into the platform for further integration and analysis. Boccuzzi is no stranger to the company’s analytics platform and engine, having been involved with the development process behind it since joining Lavastorm in 1999.
Boccuzzi believes that all the attention around big data is drawing more attention to analytics and business intelligence options, but also for more established approaches to analysis like trusty old R. As he told us, “One interesting facet of the big data explosion has been the expansion in the set of people who must become data analysts. This requirement isn’t usually satisfied cheaply or easily with software, and it often requires a technical skill set which traditional analysts may not have or easily obtain.”
Just as R itself has been around for ages, Lavastorm goes way back with big data analytics and business intelligence. With MIT research roots that blossomed into business shoots in 1993, the company found a home in the telecom industry in particular with rather basic database-driven operational tools. As the needs of businesses grew more complex, the company expanded into new markets with the advent of its Analytics Platform and Analytics Engine, which aim to merge clean and define diverse data types for analysis. They claim that their approach is robust enough for data-intensive projects like fraud detection, optimization and healthcare analytics with the ability to analyze 3 billion records per day across multiple analytic processes—although as we might imagine, it’s difficult to rely on such general numbers for such diverse data and application types.
He says the real business value of R for this group of users is that it “provides a great entry point for self-sufficient analytical work, since it doesn’t incur large infrastructure costs and it offers so much functionality off the (community-driven) shelf. When you put this much power in more people’s hands, you see opportunities for application of sophisticated analytics where it would have been cost-prohibitive before.”
As Boccuzzi explained, the problems Lavastorm is trying to solve with its R tie-in trying to solve are revolve around the challenges of assembling data from across enterprise silos and federating them into a comprehensive and trustworthy foundation for R analyses. He says that to make this more seamless, the Lavastorm platform provides a visual front end for designing data acquisition, federation, and analysis applications and a powerful and scalable back end for processing these applications.
“We see the Desktop flavor of our offering (which is available in a free Public Edition) as a great way for individuals to manage the data they’ll use with R, and also to enhance their R-based analytics by leveraging our set of components to ensure the integrity of the data,” he explained. Boccuzzi noted that the visual, component-based nature of the company’s software allows R users to create drag-and-drop components which contain complex data manipulations and R analyses but which can be packaged for use by analysts who may not themselves be proficient in R. For example, a user may combine a sequence of data filters and joins which prepare the data with a linear regression R script into a single node which a user may execute without knowing exactly how these operations were performed. This allows organizations to make the R-based statistical analyses available and useful to a wider audience. We believe the combination of our software with R represents a major enhancement to the way R is being used, and can broaden the set of users who can leverage R’s power.
In terms of pushing a business model beyond its bread and butter platform users, the Lavastorm CTO explains that the company is simply trying to provide users of R with a better end-to-end approach from data access through results presentation. “You can use free R with the free version of Lavastorm software and be quite productive. We see the commercial potential for Lavastorm Analytics’ R integration in the desire to work with larger data sets and to do more with the data before and after they are treated with R. So while our R integration is now and will remain free, we believe that exposing users to our technology will provide a compelling offering for which they will pay a premium.”
While it may not address some of the ultra high-end needs of data and compute-intensive tasks as firmly as some solutions that leverage Hadoop, in-memory platforms or vast streams of complex data, for its purposes, R could potentially boost its use for mid-sized operations on the data mining and optimization as well as for building models for specific tasks.