Follow Datanami:
October 24, 2013

Skytree Hangs Machine Learning Hat On Hadoop

Alex Woodie

The IT discipline of machine learning is experiencing a resurgence of popularity, as organizations explore any possible way to leverage their big data stockpiles for a competitive advantage. One machine learning software company that looks poised to grow is Skytree, which this week announced support for the biggest big data platform of them all, Hadoop.

Skytree worked with the three biggest Hadoop distributors–Cloudera, Hortonworks, and MapR Technologies–to get its Skytree Server suite of machine learning software certified on each of those distributions. This will benefit Skytree customers primarily by eliminating the need to move the data before analyzing it, explains Skytree CEO and co-founder Martin Hack.

“The idea is to make machine learning more accessible, easier to use, and ultimately get a broader audience for it,” Hack tells Datanami. “Machine learning has been around for a long time. It started out with artificial intelligence in the 50s and 60s.  That didn’t go so well. In the 80s and 90s we had neural networks. That did a little better, but it’s still not as appealing.”

Today’s machine learning technology is not about turning computers into living things. “We don’t want Skynet,” Hack said, jokingly, just before his phone connection suddenly died. (Editor’s note: Do not joke in front of the computers.)

What a learned machine might look like (artist’s conception).

Today’s machine learning algorithms are not about world domination, a la Hal 9000, or the fictional self-aware computer in the movie The Terminator. Instead, machine learning today is about finding some aspect of a business, and applying loads of math and process rigor against it, until that aspect–be it fraud detection, customer retention, or product recommendation–can be improved to such an extent that it makes the company more money, saves them money, or reduces risk.

That is, in a nutshell, what all big data analytics software vendors are trying to do. So what makes machine learning and Skytree unique? Skytree has similar goals as other big data software companies–deliver results, and do it faster, easier, and cheaper than others–but Hack insists that its focus on machine learning gives it a slightly different flavor than those that rely purely on statistics or other approaches.

“We’re not a Hadoop company. We’re not a database company. We do machine learning. That’s all we do,” Hack says. “We’re the first to have a platform approach, so data scientist and data analysts can use it on a broad range of applications and use cases. It’s enterprise ready, meaning it’s extremely scalable, extremely fast and extremely high accuracy.”

The fact that Skytree Server can run on Hadoop will not only minimize the need to move data, but it will simplify life for IT administrators, because Skytree and its 60 to 70 machine learning algorithms can run right on the Hadoop cluster. “We have customers with 1,500 to 2,500 [node] Hadoop clusters under management.  What happens is they create a policy where 500 nodes are used for ETL, 500 for the database, and 500 for machine learning. It works out quite nicely,” Hack says.

Roll-it-yourself Hadoop shops may be inclined to go the Mahout route, and use the open source machine learning engine that’s an incubator project at the Apache Software Foundation. Other established firms may opt to take (or continue with) the statistical approach, and stick with statistical leaders like SAS or SPSS.

But neither of those approaches provide the scalability and performance that Skytree was designed to deliver, Hack argues. “It really depends on how big the data sets they are trying to do, and how much production level quality they want,” he says.

It’s not just about the algorithms, either. Skytree aims to build a management infrastructure around the math that enables users to move from one problem to another, as they make discoveries about their data and the algorithms learn from that. That capability to refine and improve one’s algorithms is, after all, the heart of the machine learning approach. But once companies discover the power of machine learning, they’re apt to want to expand it.

Predicting churn is a classic use case for this type of software. If a company can detect when a customer is about to defect, they can step in with the right product at the right time to prevent that from happening, not only preserving revenue, but retaining the costs of obtaining a customer to replace him. “We give them a new arsenal of weapons, so now they can do much more targeted and effective campaigns,” Hack says.

But churn alone is not going to help you, he says. “It’s usually much broader than just one problem you’re trying to solve,” he says. “You need to do segmentation and scoring.  You might want to use it with algorithmic pricing. In other words, there’s a whole cycle to it. If you only do it one thing, it’s a point solution, and you have a black box.”

There’s nothing wrong with a black box approach, which is fine for some customers. “It’s just that the customers we work with don’t like it very much, because they have more than one use case,” he says. “And if you have more than one use case, pretty much by default, you need more algorithms and a more broader appeal. It becomes a more integrated approach.”

Skytree Server runs on X86 Linux servers, and doesn’t require users to be Ph.D. level statisticians. The software can be used by everybody from every-day-users who don’t know SQL, all the way up to the most experienced data scientists, Hack says. The package has anywhere from 60 to 70 algorithms at the moment, and more are being added every month by Skytree’s team of mathematicians and engineers.

What algorithms, specifically, come out the box? There are ones listed on the website, like K-Means, principal component analysis (PCA), two-point analysis, random decision forest, and others. But what else? “We don’t disclose all of them for competitive reasons,” Hack says. “That’s what we do. The algorithm guys, the founders, the people on the team–they’re math guys. We create the math underneath essentially, while the computational aspect comes from people with HPC backgrounds.”

So far, the company has more than 30 customers, including UPS, Samsung, eHarmony, and Adconion. Investors include U.S. Venture Partners, Javelin Venture Partners, Osage University Partners; UPS and Samsung have also invested in the firm.

Related Items:

Using Open Source Data to Identify Security Threats

Skytree Secures $18 Million in Funding

Startup Claims One Giant Leap for Machinekind

Datanami