Follow Datanami:
March 17, 2014

Astronomical Algorithm Powers Data Analytics Startup

Alex Woodie

Astronomers at the national labs have enjoyed a handy fallback plan when faced with a glut of images that need analysis: grad students. So when researchers at UC Berkeley developed machine learning algorithms that could automatically scan these images, not only did grad students need something else to do between classes, but the developers realized they might have the makings of a winning big data analytics business plan on their hands.

That big data business plan got a shot in the arm today when announced a $2.5 million round of Series A funding, led by Voyager Capital. That will go far in helping to scale the company that was co-founded by UC Berkeley astrophysics professor Joshua Bloom and four other leaders in computer scientists, statistics, and machine learning.

The other big bit of news is the hiring of data analytics industry Jeff Erhardt to head the company as its CEO. Erhardt recently briefed Datanami on the company’s goals and the unique way it’s approaching the booming market for machine learning (ML) algorithms. Erhardt describes how Bloom and his colleagues were inundated with time-series images collected by telescopes run by UC Berkeley and Los Alamos.

“The traditional way to do it is to hire more grad students to try to understand if there is something interesting going on in these images,” he says. “What they ended up doing was applying ML to that, to try to predict from these streams of images coming in whether there something interesting that’s going to happen in the future that merited further research, i.e by pointing these expensive telescopes at it.”

While the data science, statistics, and Python programming that went into’s random-forest ML work is interesting in its own right, what separates the company is the application framework it built around the algorithm. “Machine learning is the wave of the future. But it is still way too hard for a normal company to implement–i.e. somebody’s whose not Google or Facebook with a 100 person ML team,” Erhardt says. “But more importantly it needs to be easier for a business facing person to implement and understand the results.”

The main problem with real-world applications of ML algorithms is the amount of skill and time it takes train and tune the algorithm to work on a particular user’s data. is addressing this problem with its cloud-based application framework, which Erhardt says speeds the ML development cycle.

“Because [the algorithms] are so efficient and high performant, we’re able to effectively leverage cloud computing to do all the parameter tuning that normally takes a data scientist days or weeks to figure out,” Erhardt says. “We basically do that automatically. We basically train the machine to develop the ML model.” relies on Apache Spark to provide the massive parallelism needed to help its customers process images, text, and time-series data. On top of this framework, the company has developed workflows and APIs that allow customers to connect the ML application with their various cloud-based and on-premise applications.

The setup provides another way to solve classic big data problems, such as churn analysis and lead scoring. Many bigger organizations have adopted Hadoop to help address these systems, but doesn’t see why smaller shops can’t take advantage of breakthroughs in big data analytics. CEO Jeff Erhardt

There’s a tremendous amount of structured and relational data in the world that doesn’t require a Hadoop cluster to aggregate and understand, Erhardt says. “And at the same time, this data is being underutilized by customer-facing employs who are responsible for making data-driven decisions. It really is that part of the market we’re trying to address.”

The hiring of Erhardt and the first round of funding are aimed at scaling the company to address the “customer experience market,” which Gartner pegs at $20.6 billion for 2014. already has a dozen customers in production and is looking to ramp up aggressively. is still in the early stages of its development. With so much talent already assembled, it’s a company worth keeping an eye on, especially as the ML arms race continues and organizations begin to adopt the next generation of ML-based analytic applications, which will be easier than ever to use.

Related Items:

Apache Spark: 3 Real-World Use Cases

Lessons In Machine Learning From GE Capital

How PayPal Makes Merchants Smarter through Data Mining