An Enterprise Formula for AI Success
One of the great things about the current wave of AI innovation is the large number of open source tools, technologies, and frameworks. From TensorFlow to Python, Kafka to PyTorch, the we’re in the midst of an explosion in diversity of data science and big data toolchains. However, when it comes to putting these toolchains together and building real-world AI applications, regular companies suffer from a serious technology gap compared to technology firms.
The technology giants have a curious habit of releasing powerful technology onto the unsuspecting masses. For example, in 2015 Google unveiled TensorFlow, which enables users to build and deploy very large and very accurate neural network models. A year later, Facebook, released PyTorch, which some say is an easier-to-use framework for machine learning development. Both are among the most heavily used technologies for machine learning today.
Nobody is complaining too much about Google’s and Facebook’s decisions to release such ground-breaking technology. After all, they’ve been at this for many years. While the tech giants do benefit by getting the open source community to continue to develop and maintain technology that it puts into the public realm, it’s safe to say that the open source community receives bigger benefit than the tech giants.
But these AI gains have not flowed equally. Many of the latest open source AI technologies are not known for being easy to work with, and typically require highly skilled data scientists to use. This puts a cap the applicability of the AI tech, and limits its use to companies that have the budget to hire experienced data scientists.
That leaves a lot of companies out of luck when it comes to leveraging the latest in AI innovation, according to Phil Gurbacki, the senior vice president of product and customer experience for DataRobot, a provider of automated machine learning and enterprise AI offerings based in Boston, Massachusetts.
“I think there’s certainly a significant portion [of the emerging AI stack] that is open source or brought in from the open source community,” Gurbacki says. “I just don’t know that there’s a good way of supporting and bringing that to an enterprise in a scalable way.”
There are pockets of regular (i.e. non-tech) companies that are tech-savvy and are able to work with the emerging open source AI technologies.
“But when we’re working with retailers and insurance companies, they’re really looking for that problem to be solved for them,” Gurbacki says. “The open source certainly has a place in high-tech companies. But there are many other markets where we’re finding there’s a tremendous amount of value that our customers are getting from just our ability to package everything up for them and bring them to an organization.”
DataRobot’s enterprise AI platform is not open source. You cannot just download it and begin using it any way you like (although you can probably get a trial copy if you try). While the company’s software is not open source, it does make use of a number of open source components, Gurbacki says.
“Pieces of the product offering” are open source, he says. “We’re just finding that organizations aren’t satisfied piecing together all of the parts themselves.”
DataRobot’s architecture allows users to quickly adopt and plug-in new architectures and new frameworks, Gurbanki says. “You can use TensorFlow models, Keras models, LLVM,” he says. “You can use CNTK in Microsoft’s toolkit or Python models. We have this plug-in architecture that allows us to quickly and rapidly adopt new open source technology.”
Because it works so heavily in open source technology, DataRobot often runs into issues. When it does, it often contributes bug fixes back to the open source project, so that others can benefit from the find.
End to End Delivery
DataRobot was founded in 2012, and has grown quickly over the last few years and now has more than 1,200 employees, with a valuation reportedly in excess of $1 billion. It’s helping to solve data science challenges for customers like United Airlines and Black & Decker.
In its early days, the company focused primarily on automated machine learning. The software brought automation to many of the tasks that typically required a data scientist. That includes things like determining which algorithm is appropriate for certain data sets (it’s pre-loaded with hundreds of open source algorithms, for multiple languages, from all the popular packages). After testing against different algorithms, the software would also automate deployment of the model as a Spark or a Python job to a big data cluster, running Hadoop or other popular platforms.
It’s widened its repertoire considerably since then, Gurbacki says. “What you seen us do over the last three to five years is really focus on delivering that end-to-end platform and putting the pieces together,” he says. For example, it has widened its capabilities with several acquisitions, including the purchase of data prepper Paxata in December and its June 2019 acquisition of ParallelM.
But the biggest value DataRobot provides is helping clients to adhere to data science best practices while automating as much of the end-to-end lifecycle as it can.
“So when we’re automatically creating features and generating new derived calculations, we’re following data science best practices to make sure we’re not over-fitting,” he says. “It’s not just the open source value we’re providing. It’s also the layer of automation and integration between the pieces that really provide the value.”
The cloud looms large when it comes to AI. Deloitte (one of DataRobot’s customers) predicts that 70% of the companies that adopt AI technology by 2019 would get their AI capabilities through cloud services. It also predicted that 65% of companies doing AI would create AI applications using cloud-based development services.
The cloud giants offer similar levels of simplicity and insulation from technological complexity as DataRobot, and in the wake of Hadoop’s implosion, they are attracting plenty of customers to their comfy big data stacks. Google Cloud Platform, Amazon Web Services, and Microsoft Azure each offer platforms that can do just about everything you’d ever want to do with data.
The pinch, of course, is that you can have to do it on their cloud. That drives some customers into the waiting arms of DataRobot.
“There’s going to be an AI platform on every cloud,” Gurbacki says. “Amazon will have one. GCP will have one. Azure will have one. But what we’re hearing is customers don’t want to be locked into a single cloud vendor. They really like DataRobot because we’re cloud agnostic in the sense that, if a customer invests in DataRobot, we can run in any of those clouds or on-prem.”
Companies have a lot of choices for how they get their AI these days, whether it’s pure open source or cloud services. DataRobot is finding a happy medium with its hybrid approach, which exposes customers to the benefits of fast-moving open source technology while insulating them from the vagaries of cloud vendor lock-in. For some customers, it could be the best of both worlds, delivered with a robot’s smile.