AI Democratization a Work in Progress, H2O’s Ambati Says
While only about 1% of companies are making the most of their data today, real progress is being made in democratizing the use of AI, and the future of business automation via AI is quite bright, H2O.ai’s CEO and founder Sri Ambati said before a pair of H2O World conferences this week.
“There’s still a long way to go from where we are. It’s in the earliest phases of adoption,” Ambati told Datanami in an interview earlier this month. “You can see that only 1%, or less than 1%, of the world’s companies can truly leverage their data. So that means 99% needs further adoption, simplification, and cultural transformation to use data and AI. It’s going to take the next 10 to 20 years.”
H2O.ai may be best known for its eponymous open source machine learning model, which is used by tens of thousands of data scientists and machine learning engineers around the world. Ambati said he enjoys the fact that H2O is commonly cited in job descriptions for data scientists, alongside commonly used technologies like TensorFlow, scikit-learn, PyTorch, and Gluon.
But these days, Ambati spends much of his time thinking about how best to automate the use of machine learning through H2O’s enterprise AutoML offerings, including Driverless AI, which simplifies the application of traditional machine learning programs, and more recently through Hydrogen Torch, which brings automation to deep learning, specifically the popular PyTorch framework.
Ambati is particularly bullish on the potential of Hydrogen Torch, which is based in part on input provided by 33 Kaggle Grandmasters that H2O works with. For example, Hydrogen Torch includes the templates created by Grandmasters like Philipp Singer, a senior data scientist at H2O, is currently ranked number three on the Kaggle charts. “We’re digitizing their best practices,” Ambati said.
Deep learning techniques are predominantly used in the areas of computer vision and text processing, and the goal with Hydrogen Torch is to lower the barrier of entry into these forms of AI.
“What we did the Driverless AI was make machine learning very accessible,” said Ambati, a 2019 Datanami Person to Watch. “What this is doing is actually making deep learning very accessible, whether it’s object detection or text summarization.”
While tabular data is popular in traditional machine learning, the emerging deep learning use cases rely on less structured data sources, including images and documents. H2O’s new Document AI solution, launched earlier this year, enables its customers to use documents as primary data sources for AI.
“Documents can be much more high-fidelity data than the group-bys and filter joins, because there is the potential for error across those tables,” Ambati said. “Especially in the last 18 months, [the usability] of large language models and pretrained models has gotten so much more accurate that we can now use unstructured sources data as the real form of data. We used to use it as an alternate source of data, and now we look at it as the main source of data.”
Document processing is critical across large swaths of industry, including healthcare, insurance, banking, telecommunications, and government. The combination of high-level optical character recognition (OCR) scanning and AI systems such as H2O Document AI is giving companies a real leg up in terms of processing these documents.
One of H2O’s customers in the insurance business was able to take the accuracy of its automated document handling system from 60% to 70% up to the 95% to 98%. That helps take the pressure off the existing staff members, Ambati said.
H2O hosted a pair of H2O World events this week, including one in Sydney and another in Dallas, Texas. The company rolled out new offerings at the shows, including a new labeling tool for deep learning use cases and a new wizard for Driverless AI.
The new Label Genie brings enhancements in the area of one-shot and zero-shot learning, which means customers don’t need to provide as many examples of an object before the system can start to recognize it. It also brings support for audio data.
The new Driverless AI Wizard, meanwhile, will further reduce the level of skill required to be productive in the AutoML tool. “We added a new wizard to make it almost as easy for analyst to start using AutoML,” Ambati said. “I think it’s just bringing that bar further and further down, to make it easy to use.”
Ambati is a big supporter of the democratization of AI and machine learning, but he understands there are limits. He said he’s not a proponent of the “citizen data science” movement, in which people without formal training or experience can start building ML and AI models.
In the same way that Hydrogen Torch puts the capability of a full-blown Kaggle Grandmaster into the hands of a competent data scientist, Driverless AI will put the capability of a data scientist into the hands of a business analyst.
“But he’s still data-savvy person who is not fooled by the early results,” Ambati said. “Our core mission is to democratize AI. So how do I get from the Grandmasters to grandmas using AI….That means that we need to simplify the space–the whole space, not just simply the user experience. The user experience is just one step.”
As the barriers come down to AI and more people start adopting it, it drives a need for greater data education and a stronger data culture, Ambati said. People working with data need to have a healthy skepticism of what the models are saying, how they might be wrong, and what biases might be at play.
“The data is telling a story, but people can interpret it in ways they want to and make decisions that are actually along the lines of what they had hypothesized to begin with,” he said. “I think being able to make sure that there is enough data literacy and then, understanding that in machine learning, all models are wrong, but some models are useful.”
As AI evolve, humans will evolve with it. Some jobs may become redundant with AI, but at the same time, employees will also become more productive and effective thanks to AI helpers. Ambati singled out the large language models as having a great potential to automate tasks across a range of industries.
Titles and job descriptions in the fields of data science and advanced analytics are changing, too. Data scientists who have proven their worth will have new career paths open up to them in the C-suite, including as chief data and analytics officers (CDAOs), Ambati said. In fact, Ambati predicts that by 2030, a good percentage of CEOs will actually be former data.
“We’ve seen a lot more business owners ask data scientific question,” he says. “That’s actually been very refreshing.”