Statistica Embraces Python In Bid for ‘Collective Intelligence’
Organizations that use the new release of Dell’s Statistica software package can now incorporate separate Python models into their analytic workflows. It’s the latest in the computer giant’s new open approach to advanced analytics, dubbed “collective intelligence.”
In the world of advanced analytics, there are often two camps: those who place their faith in proprietary suites from the likes of Statistica, SAS, Mathworks, and IBM‘s SPSS, and those who embrace open source tools like Python, R, S (a precursor to R), and Weka. Adherents on each side will fiercely defend the advantages of their approach, with little common ground in between.
Now Dell is extending an olive branch toward the open source tools that it would ostensibly compete with Statistica, the proprietary suite of analytic software the Round Rock, Texas company obtained with its 2014 acquisition of StatSoft.
With today’s launch of Statistica version 13.1, the company is supporting Python, which has rapidly become one of the most popular languages for developing advanced analytic and predictive models. It’s not the first time Dell has embraced open source tools, according to John Thompson, the general manager for Advanced Analytics at Dell Software.
“For a long time–six seven releases–we’ve been able to bring R into our workflows, bring them into our model management system and secure and govern and version it there,” he say, “and with 13.1 we’re adding Python to the environment.”
Thompson further explains the rationale:
“Our view is there will be a wide range of data scientists and business analysts involved in the advanced analytic market, and many of them will want to use others tools than large installed proprietary tools,” he says. “If they say, ‘Hey I have a workflow and it’s going to have seven models, and five of them are going to be in R and two in Python,’ we can bring that in and seamlessly work with that environment without any changes whatsoever–right off the shelf, right out of the gate.”
The company has done similar things as part of the “collective intelligence” theme that it’s been working on for the past year. For example, in 2015 Statistica gained the capability to tap into machine learning models developed on the AzureML cloud offering hosted by Microsoft (NASDAQ: MSFT). It also lets users reach out and tap into Algorithmia‘s cloud-based algorithm market; to Apervita‘s health analytic capabilities; and to Experfy‘s market for linking data science professionals with clients, which has been called the “Uber of Big Data projects.”
“In collective intelligence, what we’re making available to Statistica clients is our ability to open up their organizations and ability to leverage best of breed models no matter where might reside in the world,” Thompson says. “It’s a market-leading concept that I don’t see anybody else doing and it’s something that we think is going to be very exciting for organizations as they go forward and realize they can’t hire every data scientist they want and need. Nor do they really need to hire them all. They can enlist them wherever they happen to be.”
Statistica provides about a diverse range of analytic capabilities (16,000, to be exact). The software–which Gartner promoted into the Leader’s Quadrant in its latest report on advanced analytic tools–helps users to access data from a variety of locations, from standard databases to unstructured repositories like Hadoop; to cleanse and prepare the data for analysis; to explore data visually; and develop predictive models that can be pushed down to run in Hadoop, relational data warehouses, or even edge devices like routers and gateways.
With a million current users and more than three decades of use, Statistica has seen analytic trends come and go. The current boom around big data shows no sign of ebbing, and has greatly expanded demand for analytic expertise.
In Thompson’s view, the unmet need for data scientists has created a market opportunity that he intends to meet with better software. The new automated data preparation functionality in 13.1 furthers the theme of the democratization of analytics and data science. Instead of paying a small fortune to a data scientist who will spend three-quarters of her time prepping and cleansing data, the organization can rely on Statistica to take care of that for them.
“We want to start moving in the direction to make advanced analytics easier to consume by people who are not PhDs and core data scientists,” he tells Datanami. “We’re making it easier for people…to bring together a wide range of data, and have the system take care of the treatment of outliers and null values and averages and feature selection and things like that.”
Organizations shouldn’t be too concerned about putting so much statistical power into the hands of mere mortals, Thompson says, because Dell included “guard rales and safety bumpers” so they don’t get erroneous results.
The new release also helps organizations organize their analytic assets (including data prep work, data modeling, analytics, and visualizations) and make them available for re-use by other members of the organization. This addresses talent gap that many organizations are feeling as they try to adopt big data analytics,” Thompson says.
“We’ve have heard lots of buzz about the shortage of qualified data scientists that everybody is trying to hire into their organizations,” Thompson says. “There’s a concept of having experts do the work that they’re eminently qualified to do, and then compiling or storing or publishing that work into repositories so a broader audience of people can use it and reuse it.”
Statistica is also gaining new IoT capabilities via integration with Boomi, the Dell Software company that provides cloud distribution of data and processing. With version 13.1, organizations can take any machine learning model—such as a neural nets, a logistic regression, or a boosted tree—convert it to a Java app, and then push it to Boomi, which will further distribute it out to edge nodes.
“We’re one of the few companies out there that has a generally available product that enables you to drive models out all the way to the edge of the network and manage those in a holistic way,” Thompson says.