A Pressing Need for More Data Literacy
As big data and AI become greater influences in people’s lives, the need for a basic education on how data and AI systems work is becoming more apparent. Two data science industry leaders, including the CEOs of Anaconda and H2O.ai, recently shared their views on how data literacy is becoming an essential requirement for engaging the modern world in a meaningful way.
Children are growing up steeped in technology, giving them a better grasp of how to get around in today’s data-driven world. However, many adults lack that insight, which is why there’s a big need for data literacy education among adults, said Peter Wang, the CEO of Anaconda.
“When I talk about data literacy, it really is about adults, sometimes older adults who are already in positions of power,” Wang said. “They could be septuagenarians in the halls of Congress. For a lot of these folks, the reason why data literacy is so important is that the world today is driven by computational systems powered by data making assessments and judgments and making decisions and influencing every aspect of our lives in a data driven fashion.”
If you don’t understand how a computer application works, such as an airline reservation system, that’s fine, Wang said. People are able to drive cars without knowing the ins and outs of how the planetary gear in a transmission works either, he pointed out. But you won’t get too far without knowing the rules of the road of the data and AI-powered world that’s being built around us, he said.
“When it comes to the modern kind of machine learning and predictive systems and soon AI systems and robotic and cybernetic systems, it’s really important for people to actually understand how those things work,” Wang said. “Not in terms of the details of the programming, but in terms of what can and can’t be done with data, where things like bias and explicit human heuristics come in, and the limitations of computerized systems.”
AI and machine learning are being driven into nearly every aspect of our lives. From financial services and healthcare to government and law, just about any decision that involves data can be automated to an extent using predictive algorithms. Our data literacy will determine the extent to which we believe those predictions are not only accurate but also generated in an ethical, moral, and legal manner.
Wang emphatically rejects all-or-nothing thinking, where one just blindly accepts the results of “black box” algorithms because it’s supposedly based on math or science. Similarly, we should push back on any attempt to eliminate debate on decisions because we’ve wrapped ourselves with the mantle of “data.”
Both AI and data are fallible, he says, and can fail us in many different ways. Data literacy is about understanding the failure modes (or as many as are feasible).
“It’s simply not acceptable I think for us as a species for 99% of the people to say ‘Well, it’s a black box. There’s an AI in it. Surely the AI is making the right decisions?’ or ‘I hate it because it’s AI. You can’t trust any of it,’” Wang said.
“Those kinds of simplistic binary modes or approaches to sense-making around the world of fused AI is not going to be good,” he continued. “It leads to a very bad set of outcomes for everyone. And that’s why we care so much about data literacy. For people to live in that kind of world, they need to be able to understand a little bit more nuance what is and is not possible around this stuff.”
All Models Are Wrong
Data is important to this discussion too. The ability to gather data and process data is absolutely critical and central to any scientific endeavor. Without data, there is no science, let alone data science and AI. But not all data is equal. Data can mislead us just as easily as it can show the path. Understanding when data is lying to us is a critical skill, but it’s too often in short supply, according to Sri Ambati, the founder and CEO of H2O.ai.
“The data is telling a story, but people can interpret it in ways they want to and make decisions that are actually along the lines of what they had hypothesized to begin with,” Ambati told Datanami recently. “I think being able to make sure that there is enough data literacy and then, understanding that in machine learning, all models are wrong, but some models are useful.”
H2O.ai is working to democratize AI, and data literacy is a big part of that overall journey to make the technology easier to work with and help users adapt to the future AI-powered world. According to Ambati, there is a large need for much more education and awareness on data literacy in organizations.
“It’s the place where we need to have global learning and [fill] those blind spots around it. We have a lot more education and awareness to be built in the organization,” he said. “We need to continue to simplify the entire space to…make intelligence and learning easier. We’re on that path, but we need to simplify it in order to get that iPhone-like simplicity. We’re not there yet, as a whole industry.”
While AI will replace some jobs, the more interesting positioning has humans being augmented with AI. A recent study found that deep learning was not as good as human radiologists at detecting tumors. However, the human radiologist equipped with deep learning was better than deep learning and humans standing alone.
“When you’re combining human and artificial intelligence, it becomes more and more of the common trend,” Ambati said.
A Better Framework
Wang recalled Cloudera’s old motto: “Ask bigger questions.” While its collection of data management and data science tools may have not have won a decisive battle in the open market, the vendor had the right approach to framing the bigger purpose. People want to ask bigger questions, and are using data-driven approaches to ask them. With more data literacy, they’ll make better use of the tools.
“A lot of people in the middle [management roles] are learning data sciences. They’re taking boot camps, they’re trying to get more and more into this stuff,” Wang said. “And what I’m suggesting with the data literacy framing is that we should give them not just the tools, but we need to help orient their thinking around these tools. They’re not there to help you confirm or just count things, but they’re there to help you open up your mind and open up how you understand the world.”
As the CEO of Anaconda, the provider of Python-based data science libraries and tooling, Wang has done quite a bit to enable data scientists, analysts, and developers to succeed with data science. However, Wang sees society-wide impacts if one aspect of the broader data literacy effort fails, specifically around controlling one’s digital destiny.
Specifically, the potential downsides if people become pacified to just accept the digital world as large corporations dictate it, as opposed to having the knowledge to shape their own world, are great. This is where today’s youth perhaps can benefit from a lesson on the intersection of data literacy and recent history.
“We went from the ’90s where it’s an open hackable Web, you make your website, Geocities, you could be your quirky fun self and find like-minded people…to two rats in the cage talking to each other about which cage they preferred,” Wang said.
“That’s not to say everything should be open and Linux everywhere,” he continued. “But it’s more about, do you want to live in a world where the edifice is completely immutable and you are just like this creature living in these walls, in this digital cage? Or if we can make this so people are not content with that…it creates the market, it creates the demand, it creates the back pressure to demand some of these things.”
The discussion conjures images of the film “The Matrix,” which is a wholly fictional rendering of one possible future world. Surely, we’re not heading to that horrible place?
“We’ll get there if we don’t do something about it,” Wang warned. “It’s already dystopian.”