How Machine Learning Is Eating the Software World
Marc Andreessen famously said in 2011 that software was eating the world. Four years later, that trend has accelerated, only now it appears that machine learning technology is on the cusp of eating software, and that algorithms will take over the world, with a little help from their friends: the APIs.
Not that this is a bad thing, at least not as Elon Musk envisions, with AI-powered overlords enslaving the human race (a separate story for another day). But if you recognize the points that Andreessen made in his famous Wall Street Journal article—how outfits like Amazon, Netflix, Flickr, and Pandora that are essentially software companies eviscerated the “bricks and mortar” giants that had dominated the markets for books, movies, photos, and radio up to that point—then you have probably recognized that the trend has only intensified here in 2015.
In today’s big data world, the focus is all about building “smart applications.” The intelligence in those apps, more often than not, doesn’t come from adding programmatic responses to the code–it comes from allowing the software itself to recognize what’s happening in the real world, how it’s different from what happened yesterday, and adjust its response accordingly.
Computers are learning to think, read, and write, says Bloomberg Beta investor Shivon Zilis. “They’re also picking up human sensory function, with the ability to see and hear (arguably to touch, taste, and smell, though those have been of a lesser focus),” she writes on her blog. “Machine intelligence technologies cut across a vast array of problem types (from classification and clustering to natural language processing and computer vision) and methods (from support vector machines to deep belief networks). All of these technologies are reflected on this landscape.” (See below for Zilis’ informative 2014 graphic of players in the ML space.)
Armed with every-increasing volumes of data and sophisticated machine learning modeling environments, we’re able to discern patterns that were never detectable before. The next step—deploying those machine learning models into real-world applications—can be tricky. The question, then, becomes how to deploy machine learning technology in the quickest, more efficient, and impactful way. Of course, that is easier said than done.
Checking the ML Box
One person who has puzzled over this challenge more than most is Sri Ambati, the co-founder and CEO of H2O.ai, which develops tools that data scientists use to build machine learning models. “Up until now, most of the focus was this offline analysis,” Ambati says. “The way data analytics was done historically was you’d have a statistician or a mathematician sitting in the corner trying to see information from analysis.”
The prototypical data scientist–one part math genius, one part Java developer, and one part business expert—would be your typical go-to person for deploying a predictive app with a machine learning algorithm at its heart. Of course, there’s a huge shortage of data scientists, so naturally people (and venture capitalists like Andreessen and Zilis) are looking to software to solve the problem.
H2O’s Ambati hopes to make the power of machine learning accessible to standard programmers with the next iteration of the H2O product, which was launched today and is free under an open source license. With version 3, H2O is allowing Java, Python, and Scala developers to access the H2O machine learning technology directly from their integrated development environment (IDE). What’s more, they can call the machine learning routines with a simple REST API.
“What we see as the vision for the space is that these analyses happen on the fly as things are happening, and as this is happening, you’re changing your models and building new business solutions,” Ambati tells Datanami. “The future of software engineering is going to be transformed with data science and machine learning. Software is eating the world and machine learning is eating software.”
Ambati aims to make it easy for regular software developers to work with various machine learning algorithms that H2O includes in its library, including gradient boosting machine, deep learning, generalized linear model, K-Means, distributed random forests, and naïve Bayes. Once you have the data, it’s relatively simple matter to release the machine learning algorithms against it, to find the patterns for you. Then those patterns can be encapsulated in software code and made available for embedding into the smarter apps by exposing them to multiple IDE environments via APIs.
ML for the Masses
H2O, of course, isn’t the only big data software company chasing this goal. The big cloud players, Amazon, Microsoft Azure, and Google Cloud, have all launched cloud-based machine learning systems that allow developers to call machine learning tasks through an API. If you want to keep your apps on-prem or want more sophisticated capabilities than the cloud players can provide, it’s recommended you look elsewhere.
Cask is an open source software outfit that’s seeking to build higher-order application templates for Apache Hadoop that take the sting out of tedious data science and app development work. As Cask CEO Jonathan Gray recently explained, the company is seeking to bundle sophisticated machine learning capabilities behind a simple API.
“We can hide the machine learning modeling task as a template, so a developer…can be writing their code against domain specific- API, against a higher-level API,” Gray told Datanami in a recent interview. “Our mission in life is to get a developer as far down the path of solving the problem as they can, so when they get to our platform, instead of being all the way on the other end zone, they’re on the five-yard line coming in, and you really just need to tweak this and that, write a little code here, some custom logic there, and you’re done.”
Hadoop may be the standard bearer for the types of large-scale data analytics work that took place over the last decade. But increasingly, organizations are looking to Apache Spark to give them an edge. Databricks, the company behind the open source framework, is looking to put the power of big data analytics and machine learning modeling into the hands of as many developers as it can.
“A very important part of Spark is the productivity the user gains,” Databricks co-founder Reynold Xin told Datanami in a recent interview. Whereas Java was the go-to language for first-gen Hadoop and MapReduce jobs, Spark opens that world up to Scala and Python. “We just want them to be using what they’re comfortable with.”
Machine learning is all around us, and its use will only accelerate as time goes on. Luckily, as the software gets better, as the distributed systems get faster, and as more data sets become available, you won’t need a data scientist to monetize it—just the ability to code an application and call an API. The possibilities for application developers to take advantage of machine learning are tantalizing. Only time will tell what they make of it.