Hybrid Analytics Yield Big Data Flexibility
You have a lot of choices when it comes to data analytics. Should you run on the cloud or on-premise? Use samples of data or the whole enchilada? Structured or unstructured? Traditional tech or Hadoop? Data scientist with Ph.Ds. or business analysts with MBAs? Typically, one would pick an approach and stick with it. But increasingly, users can take a hybrid approach that mixes different technologies and methodologies and personnel to best suit their needs.
Gartner analyst Mark Beyer is credited with creating the term “logical data warehouse” in 2011 to highlight the idea that one’s analytics shouldn’t be hemmed in by technological choices. The LDW concept (if not the term itself) has since been embraced by many of the big data warehouse providers, such as Teradata, Oracle, IBM, EMC and Actian. In many cases, the LDW offers a path for integrating traditional analytic products with new data architectures, such as Hadoop and NoSQL. Similarly, NoSQL database makers and Hadoop distributors have adopted LDW precepts to appease their customer base wary of technology’s rapid pace of change.
Now Dell Software is taking up the LDW cause with its collection of analytic offerings. Today at Dell World, the computer giant made two announcements that will free its customers to pursue their big data analytic dreams, wherever that happens to take them.
The first announcement is the integration of the Statistica, which it obtained with its March acquisition of StatSoft and Kitenga, which it acquired two years ago. Statistica is Dell’s established statistical package that’s used by about a million people around the globe. With about 16,000 functions, it’s used mostly by hard-core statisticians with Ph.Ds who want to create models on samples of structured data, such as financial or transactional data.
Kitenga, meanwhile, is a newer product that helps users analyze semi-structured data within the confines of newer architectures like Hadoop and MongoDB. Its algorithms enable users to perform natural language processing (NLP), machine learning, predictive modeling, and sentiment analysis on data gathered from social media and the Internet of Things.
Bringing these two products together gives users better options for performing whatever analytics they may want to pursue, says John Thompson general manager of global advanced analytics in Dell Software’s Information Management group.
“Statistica is a very mature and rich predictive analytics platform that has been able to connect to structured data sources for many years,” he says. “Now we’re opening it up and saying, in addition to all those structured data sources, now you can go after pretty much anything–social media data, text files, JSON, log files, semi structured data” via the Kitenga integration.
And all the logistic regression and neural network models that data scientists built in Statistica can now be exported and run pretty much wherever they like. “Those different analytics can be sent out and dropped down to relational platforms like Teradata or Oracle, or sent to a Hadoop distribution, and the analytics can be run as a traditional sample or on full volume data,” he says.
The integration of Kitenga and Statistica will also help bring together different classes of users. On the one hand, Statistica users are skilled in data sampling and modeling. “Those are people who are trained statisticians and steeped in analytical methodologies,” Thompson says. “Then there’s the business analysts who really would like to look at data in full volume. Bringing those together enables them to do it one way or another.”
This marriage of traditional and newer analytic approaches lets users analyze data in a way that’s more natural to them, he says. “It gives different user populations the ability to access those different worlds of data,” Thompson tells Datanami. “We’re very interested in enabling people to have different implementation paradigms, different environments they work in, without us forcing them to do a rip and replace or work completely in one paradigm or the other.”
The second part of Dell’s announcement involves Dell’s close partnership with Microsoft. As a result of work the companies have done, Statistica users can now export their jobs to run on the Microsoft Azure Machine Learning, a cloud-based service the company unveiled earlier this year.
Thompson says he’s very impressed with the Azure ML cloud service, in particular the way the interface gives users access to the 15 to 20 machine learning algorithms and how it integrates with people’s existing data analytics workflows. “We see the ability to have the flexibility in on-prem and cloud environment allowing quite a bit of freedom for people to implement their solution in a way that make sense to them,” he says.
Having some analytics run on data stored in the cloud while others run on premise may seem to be an invitation for relentless complexity, but it’s a model that Dell is quite familiar with, Thompson says.
“We have a lineage and history of companies like GlaxoSmithKline and Pfizer…using our technology to run their global predictive analytics environments,” he says. “They don’t have those installed as a monolithic single location installation. They’re installed all around the world…So when we saw the Azure ML system we said well this is very easy. This is a paradigm we work in today.”
As the big data analytics world evolves, hybrid solutions that involve multiple systems running on-prem and in the cloud will increasingly become the norm. Couple that trend with the ongoing democratization of powerful analytic functions and the decreasing need for skilled data scientists, and you have a recipe that will benefit many organizations in the coming years.