AI Ethics and Data Governance: A Virtuous Cycle
As companies spend billions researching and developing AI, they’re facing meaningful questions related to ethics. What does responsible AI look like? How do you control bias? It’s all very new and cutting edge, and it has serious implications for society. But before companies can even begin to address the ethics questions, they should focus on more fundamental matters of data governance.
AI technology has advanced very quickly over the past five years. We’re to the point that neural networks are better than humans at some tasks, particular in certain image classification systems. Companies can make a strong business case for utilizing these advanced AI capabilities to streamline their operations, boost profits, cut costs, and improve customer service.
But as powerful as the AI technology is, it can’t be implemented in an ethical manner if the underlying data is poorly managed and badly governed, says James Cotton, who is the international director of the Data Management Centre of Excellence at Information Builders‘ Amsterdam office.
It’s critical to understand the relationship between data governance and AI ethics, Cotton says.
“One is foundational for the other,” he says. “You can’t preach being ethical or using data in an ethical way if you don’t know what you have, where it came from, how it’s being used, or what it’s being used for.”
GDPR: A Good Start
The challenge is that there is no standard recipe or approach for data governance that works for every business. Being compliant with the General Data Protection Regulation (GDPR) is a good start, but even the GDPR doesn’t go far enough to ensure good data governance in all cases, such as with location data, which the European Union is expected to address with the emerging e-Privacy Regulation (ePR) law in 2021.
The rules for what could be construed good governance can vary for the same piece of data, depending on the context of how they were collected and how they’re being used, which is one of the problems.
“We all tend to think of data as a bunch of 1s and 0s which we put on a great big heap of data there,” Cotton says. “The truth of the matter is, of course not all data is created equally. Certainly not all data is treated equally, and it probably shouldn’t be.”
For example, recording that a customer has red hair is not considered “personal data,” Cotton says. “There is very little personal implication there,” he says. “But the fact that I’m recording that in combination with the fact that he or she lives in a village of 20 people in the north of Finland — now all of a sudden that combined data set is probably able to identify a personal individually and should be treated in a different way.”
This constantly changing, morphing nature of data can wreak havoc on a company’s ability to comply with GDPR, let alone provide meaningful information to train AI models over a long period of time. According to Cotton, the best practice is not to use data outside of the context associated with the original data’s collection.
But there are other aspects of data – and data governance – that have a more direct impact on AI and model training, and one of those is bad data.
Data’s Just Wrong
It’s safe to assume that all companies with AI ambitions desire to have the most accurate and correct data, because that will increase the AI’s effectiveness and usefulness. Since data accuracy is a byproduct of good data governance, it’s in a company’s self-interest to adopt good data governance practices.
This is the virtuous part of the cycle: good data governance leads to better AI, and better AI in turn supports more ethical AI. You could also imagine that customers who have confidence that companies are implementing AI in an ethical manner will be more willing to share more personal and better data with the companies, if they know that it won’t be abused, that it is accurate, and that they will possibly get something beneficial out of it, too.
But this whole cycle comes crashing down if the data quality is low to begin with.
“We know that a large percentage of the world’s data are just incorrect,” Cotton says. “These data quality errors come from all sorts of places. The problem is, once we apply them to AI, regardless of the ethical questions about how an AI should handle them, the AI is just going to make bad decisions at scale.”
Information Builders sells data governance solutions as part of its suite. The software helps customers to answer questions like Where did this piece of data come from? Why was it collected i in the first place? What did we do to it on its journey to what it looks like now? For what reasons? Who touched it last? Where is it being used? How is it being utilized? Do we have consent for that?
“Any proper data management project has these types of question in it,” Cotton says. “These days, when more and more companies and prospects are starting to realize the value of the data they have in the organization and starting to really see that as a strategic asset, they’re also starting to put more emphasis on the actual management and governance of that information and trying to organize and use it in the best way possible.”
Guidelines for AI Ethics
GDPR (and other similar laws) provide minimum standards in data governance for companies to meet, or face fines as a consequence. The top data-driven companies tend to go above and beyond the GDPR requirements in terms of how they interact with people and their data, but GDPR remains the low-bar for the rest of us.
Currently, there is no low-bar when it comes to AI ethics. Aand don’t expect an AI ethics law anytime soon, says Vic Katyal, a principal at Deloitte Consulting.
“There is some level of dialog emerging for people who deal with these technologies,” Katyal tells Datanami. “Obviously you’ve seen 40 or 50 SEC registrants disclose that there could be AI related risks they are concerned about. But when you talk about where organizations are moving and what they’re spending on and the journey they’ve embarked on, at this point I would say very much in its infancy.”
Companies in regulated industries are further along with addressing AI ethics, just as they tend to be further along with their data governance projects. They’re the ones who are taking a leadership role in AI ethics, Katyal says. But progress in defining a standard for AI ethics is being hampered by a lack of data governance and data privacy regulation in the United States, he says.
“We can’t even agree on a commonality of data privacy rules” in the U.S., he says. “Each state is coming up with its own stuff, at least at this point. There’s no effort to create a national rule around even basic data privacy motivation, localization type rules.”
If an AI ethics law is created, it will likely come out of Europe, and not for at least three to five years, Katyal predicted.
“People are still trying to get their arms around what AI is, so I think it’s a while out,” he continues. “I would expect to see more things happening around the data governance side of things, which I would see as privacy standards and rules and alignment there. If we can get some level of agreement it would be better.”
However, just because there are currently no regulations around the ethical use of AI, it does not mean that companies should not be thinking about it.
AI Ethics Now
Katyal provided these tips on how to start thinking about AI in an organization.
“Number one, you’ve got to prevent the proliferation of AI,” he says. “You’ve got to put control structures into managing the algorithms as well as the data side of it.”
“If you’re a regulated industry, any decision that the algorithm is making could impact or violate any regulation in some other way,” he continues. “If you’re not in a regulated industry, it’s a reputation risk, because an algorithm could go haywire, cause some problems, and cause either brand, reputation, people, or customer harm that you may have to pay consequences for.
Companies at least should get their arms around what’s being built and who’s building them. “Put governance around it because data governance obviously is moving along. Put some governance around the algorithms. Make sure you have visibility into what’s going on. Build control structures.
“This is good practice because …while there may not be regulation push to do it, it’s the right thing to do.”