Keeping the Big Picture In Sight at H2O World
SriSatish Ambati, the CEO of H2O.ai, had just taken the stage at H2O World in San Francisco yesterday morning. His keynote had barely begun, but Ambati clearly had something to get off his chest. “Time is the only non-renewable source,” he said. “Our time is not coming back to us. So make it count. Make every moment yours.”
You could say that Ambati has made quite a bit of his time up to this point. In addition to authoring the H2O machine learning library and co-founding H2O.ai (initially called 0xdata), he co-founded the big data analytics company Platfora. His professional career also took him to Datastax, Azul Systems, and RightOrder, while his academic career involved sabbaticals in theoretical neuroscience at Stanford and Berkeley and an M.S. in math and computer science from the University of Memphis.
In his time at H2O.ai, Ambati has made a big mark. The core of the operation is the open source H2O software itself, a package of supervised and unsupervised machine learning algorithms, such as K-means clustering, random forests, gradient boosting machines, Word2Vec, and others.
H2O’s generalized linear model (GLM) is generally regarded as the fastest regression algorithm for CPUs on the planet, and is a big part of H2O’s soaring popularity among data scientists. Usage statistics supplied by H2O at yesterday’s conference show significant growth. In 2016, 6,400 companies were using H2O software, more than doubling to 13,000 companies in 2017. In 2018, 18,000 companies used H2O, including more than 200,000 end users.
The numbers show that the open source H2O package has become a core component of the machine learning stack used by data scientists around the world. According to H2O, 222 of the Fortune 500, eight of the top 10 banks, seven of the top 10 insurance companies, and four of the top 10 healthcare companies are users. Well Fargo, Cisco, PayPal, Progressive, and CapitalOne are among the reference customers, and some of them shared their stories at H2O World this week.
Changing the World
Keynotes at tech conferences usually contain flashy visuals, loud music, and soaring verbiage, all in support of bolstering a certain image of a given company. But at H2O World, Ambati took a decidedly different tact with his time on stage, and he used a large part of his time expounding on the philosophical and spiritual aspects of life and his own personal journey.
The Sufis had it right, Ambati said against soft mystic chanting on the speakers. Selflessness and authenticity go hand in hand, and together help drive one towards the truth. Love is perfect comprehension. From the individual, becomes the universal.
These are ideas that shaped Ambati’s path. “I chose to build a machine learning library 10 years ago on the streets of San Francisco, and before you know it, there’s hundreds of people using it, thousands of people,” Ambati said. “To connect the dots, from individual to universal, takes an action — just one click.”
The only wisdom we can hope to acquire, Ambati said, is the wisdom of humility. “Humility is endless,” he said. “Gratitude, I believe, is endless. Thank you San Francisco. Thank you, community partners. Thank you, ecosystem creators.”
Data science is the search for truth, Ambati said. Looking at the world in new and innovative ways is the feedstock that drives data science, and this effort is what drives the Mountain View, California company.
“We’re not changing the world by being the loudest person in the room,” Ambati said. “We’re changing the world by forcing us to rethink the same thing in new ways. And the core theme of H2O is how do we make data scientists happy.
“That’s kind of an impossible task,” he continued. “But I think we’re trying.”
Driving Data Science
H2O’s accessibility and performance has landed it a spot in many data scientists’ quiver, and made it part of a standard stack of data science tools alongside other open source projects like Apache Spark, Anaconda, Scikit-Learn, and others.
Universality is an H2O benefit. Users can build in H2O using Python, R, Java, and Scala. Alternatively, they can interact with the software via H2O Flow, a codeless interactive environment accessed via a Web browser. H2O has become more of a full data science platform in recent releases, such as with AutoML, a new feature that trains and tunes models then ranks them for the user on a leaderboard.
H2O models can be developed and trained on a variety of hardware, from Windows, Linux, and Mac PCs up to multi-thousand-node Hadoop and Spark clusters, and even IBM Power Systems too. The software also supports public cloud environments at Amazon, Google, and Microsoft, as well as Databricks‘ cloud. In addition to the core H2O software, the company offers two other free packages, including Sparkling Water, which connects H2O algorithms with Spark, and H2O4GPU, a drop-in replacement for Scikit-learn that lets H2O run on Nvidia GPUs.
Before late 2017, H2O’s income came primarily from selling technical support contracts. The advent of Driverless AI changed that.
Driverless AI is a proprietary product that automates much more of the data science workflow. The key thing that differentiates Driverless AI from H2O is the addition of feature engineering in Driverless AI, according to Arno Candel, H2O.ai’s chief technology officer and the main committer of H2O-3 and Driverless AI.
As an enterprise AI product, the demands placed on Driverless AI by customers are much bigger than with H2O. “It’s almost like shipping the space shuttle,” says Candel, a PhD.-carrying physicist who modeled electrons on supercomputers and worked at the Stanford Linear Accelerator before getting into machine learning.
“You want to make sure that nothing goes wrong,” Candel continues “When they’re in production flying somewhere, when one engine stops, what do you do? You still want to get them back down. So we have all these fallback mechanisms. There’s third-party libraries that we use and they might fail. And we have to make sure that it doesn’t fail in the production environment for the customer.”
The complexity level of Driverless AI is going up as H2O builds a distributed version of product that will run in a variety of environments. “There’s actually a lot more than just the fitting of a machine learning,” Candel says. “It’s the whole packaging, the whole deployment. People need to be able to run it on an IBM system, on an Intel system, on premise, in the cloud. They have to run in secure channels and they have to have data from different sources.”
Candel solicits feedback from H2O customers for new features. In addition to the distributed version of Driverless AI, the company is in the process of building H2O3 into the product, which will keep the company busy for a while.
But Candel wants more. In fact, he craves more. He needs a bigger challenge.
“If you want a specific loss function or want a way to balance the true positives and true negatives or employ on an embedded device somewhere or want to train it on CPUs or GPUs ask us,” the CTO said in a session at H2O World yesterday. “The company is growing and there’s more and more people involved in keeping customers happy. So if you have features you want….talk to us. We are listening.”
Ambati recently challenged his technical team to build something “completely orthogonal” to H2O. The result is Quantum, a new business intelligence (BI) offering that will work with H2O and Driverless.
The idea with Quantum is to not only provide an interface to let users explore large data sets without the extra data clutter (the product will use machine learning to show only the data that matters), but to provide a feedback loop of sorts to inform future ML strategies.
Building new tools for data developers is a calling for H2O. Ambati has a vision for where AI can go, and it’s clear we’re not there yet. We’re still in the very early stages of AI adoption, he says, and there will be lots of failure. There’s no guarantee of success in this business, but the alternative to not trying — of not continuing to push the edge — is something worse than failure.
“We have to continue to innovate or we will not be relevant,” Ambati said during his keynote. “Innovate or die.”
Editor’s note: This article was updated to correct SriSatish Ambati’s academic credentials.