Follow Datanami:
January 21, 2021

Governance, Privacy, and Ethics at the Forefront of Data in 2021

(amgun/Shutterstock)

As we continue to gear up 2021, it’s hard to envision the topics of data governance, privacy, and ethics not becoming more pressing topics for companies. While the rewards of using data are great, so too are the risks connected with abusing, losing, and misusing data, and those risks are becoming ever more clear.

We tapped our network of experts for their thoughts on the matter, as well as their predictions for how trends in governance, privacy, and ethics will shape how big data practitioners do their jobs this year and in the years to come.

Getting the upper hand on data privacy and governance isn’t easy, but thanks to consumer demand, it’s fast becoming a requirement in the U.S., according to Tomer Shiran, the co-founder and CTO of Dremio.

“Users are increasingly concerned about their online privacy making it much more likely that the United States will adopt regulations similar to Europe’s General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA),” Shiran writes. “This will require companies to double down on privacy and data governance in their data analytics infrastructure. Furthermore, companies will realize that data privacy and governance cannot be achieved with separate standalone tools, and instead must be implemented as an integral part of the analytics infrastructure. Because of this, data version control will become standard in cloud data lakes and open source technologies such as Project Nessie will enable companies to securely manage and govern data in an enterprise-wide platform.”

Data sprawl is becoming a serious problem, and increases the difficulty in governing data, says Rick Hedeman, the senior director of business development for 1touch.io.

GDPR increasingly is the model countries are using for data security and privacy

“As we enter a new year, data sprawl continues to accelerate, data lakes are popping up all over the place, and information governance is getting much more difficult,” Hedeman says. “You generally think of healthcare and financial services as the industries storing a lot of personal data, but everybody is doing it now because that is where most of the value is for today’s companies.  Companies in every industry see value in knowing as much about customer behavior and sentiment as possible, but it is largely a ‘collect first, ask questions later’ approach.”

While businesses must pay attention to data privacy and copyright concerns, the vast amounts of data that people share openly and freely on the Internet is simply too interesting and useful to pass up, says Ron Kol, CTO at Luminati.

“Online data can help businesses of any size address their customers’ wishes and demands, improve the overall customer service and quality of products, as well as anticipate market shifts as they are about to unfold,” Kol says. “For this reason, the business community will supercharge online data collection operations and continue to expedite their growth at an unprecedented rate well into 2021 without showing any signs of slowing down.”

The “Wild West” days of data sharing will end in 2021 as we enter a new era of consumer privacy, says Balaji Ganesan, the co-founder and CEO of data privacy startup Privacera.

Call your digital cowpokes home–the Wild West is done

“In 2021, regulatory legislation around the world will move towards increased control of personally identifiable information (PII) data to safeguard consumer privacy,” Ganesan says. “Countries are increasingly following the lead of the EU with GDPR, as the recent regulations for CCPA in California and LGPD in Brazil can attest. The latest politicization of coronavirus data, combined with the manually and bot-assisted dissemination of information and misinformation based on personal data leveraged out of social media platforms such as Facebook and Twitter, portends the end of the ‘wild west’ of personal information on the Internet and will begin a new era of consumer privacy.”

GDPR is becoming the global benchmark for privacy policy around the world. However, Neil Sweeney, CEO of Killi, has spotted a little problem with the practical implementation of GPDR, specifically when it comes to how big tech firms are using (or abusing) personal identifiers.

“Tech giants are creating this general public perception that we’re shifting towards mobile IDs and email hashes to offset third-party cookies to benefit the privacy of consumers,” he writes. “But the real reason this is happening is so that Apple and Google can make their ecosystems even stronger. We all talk about responsible and ethical media, consent and privacy, but the industry is moving away from a reliable identifier, sacrificing privacy for survival.”

There are sizable technical challenges in governing and securing massive data troves, but in 2021, this space will a find new champion in the form of AI, says Keith Neilson, a technical evangelist at CloudSphere.

Algorithms monitoring algorithms will become increasingly common to prevent abuse of data (whiteMocca/Shutterstock)

“Cloud governance is an increasingly complex task and is quickly reaching a point where it’s impossible for humans to manage alone,” Neilson says. “AI will increasingly be relied on in the coming year to maintain cloud hygiene by streamlining workflows, managing changes, and archiving. Once proper cloud hygiene is established and maintained with AI, it will also be used as a strategic predictive knowledge tool. By predicting and addressing threats and vulnerabilities, AI will help enterprises create the best possible outcome for their cloud environments. Leveraging AI as a strategic asset will empower CIOs to make informed decisions about their cloud environments, such as evaluating costs and compliance risks.”

The twin explosions of big data and big data tech are bringing new computational models into existence. That will force data privacy to evolve in 2021, says Eliano Marques, executive vice president for data and AI at Protegrity.

“Decades ago, before smartphones existed and the Internet came to prominence, data had a sole home: the database,” Marques writes. “Data would be moved from database to database with protection for each application. With data now residing almost everywhere, the privacy and security of that data must evolve to protect it wherever it’s managed, moved to, and analyzed.  Companies are increasingly adopting analytics and machine learning systems across organizational functions such as human resources, operations, and customer success to make better business decisions by tapping into sensitive customer and corporate data….Companies should implement data protection that can evolve with these new computation requirements, so data can move safely across locations and datasets.”

There’s another secret weapon at our disposal for governing and securing data, according to Jans Aasman, the CEO of Franz: the knowledge graph.

“It’s now clear that U.S. companies who want to participate in the European market have to adhere to the GDPR as well as the CCPA in the U.S. or run the risk of hefty fines,” Aasman says. “In 2021, we’ll see the first companies create knowledge graphs that know for every individual customer where every data element is for that particular customer. These knowledge graphs will be used to automatically delete all that data (if allowed by other regulations) and keep the knowledge graph for deleted–or now better protected data–for compliance purposes.”

Ethics are becoming increasingly important (Olivier Le Moal/Shutterstock)

The era in which tech companies had a regulatory free ride is over, and it’s time to adopt responsible machine learning, says Rachel Roumeliotis, vice president of AI and data content at O’Reilly Media.

“Data use is no longer a ‘wild west’ in which anything goes; there are legal and reputational consequences for using data improperly,” she writes. “Responsible machine learning (ML) is a movement to make AI systems accountable for the results they produce. Responsible ML includes explainable AI (systems that can explain why a decision was made), human-centered machine learning, regulatory compliance, ethics, interpretability, fairness, and building secure AI. Until now, corporate adoption of responsible ML has been lukewarm and reactive at best. In the next year, increased regulation (such as GDPR, CCPA), antitrust, and other legal forces will force companies to adopt responsible ML practices.”

Algorithmic bias is a real threat to the goal of achieving fair and equal treatment for people at the hands of AI models. Kevin Goldsmith, the CTO at Anaconda, predicts this year will bring new realization of the dangers that exist within bias.

“This year [2020}, there have been many necessary conversations around bias and mitigation in AI algorithms and around how to address the societal impacts of algorithm-based personalization,” Goldsmith writes. “However, we need to continue development of tools that provide insight into the results of ML systems, reveal bias, and check drift in deployed models over time. This becomes ever more critical as more of these systems are put into production, to ensure that we’re not perpetuating or creating sources of harmful bias.”

Addressing bias in AI algorithms will be a top priority, and will lead to guidelines for how ethnicity is tracked with facial recognition systems, says Robert Prigge, CEO of Jumio.

Legal issues surrounding the use and abuse of AI are just starting to be fleshed out

“Enterprises are becoming increasingly concerned about demographic bias in AI algorithms (race, age, gender) and its effect on their brand and potential to raise legal issues,” Prigge writes. “Evaluating how vendors address demographic bias will become a top priority when selecting identity proofing solutions in 2021.”

Don’t be surprised to see more lawsuits filed due to bias in machine learning, says Jason Tan, the CEO of Sift.

“In 2021 we will see a marked increase in the number of lawsuits filed implicating artificial intelligence technologies,” Tan says. “While we’ve seen high-profile suits brought against companies over the last few years, AI is simply more prevalent in our everyday lives. As an immature technology, we’re going to see AI systems make more (and new) mistakes that carry real human impact. When mistakes are made, consumers will take legal action.”

We’ll see steady progress in the ethics of AI in 2021 and 2022, predicts Cindy Maike, vice president of industry solutions for Cloudera.

“Today, ethical AI conversations revolve around the anonymization of data,” she writes. “We’re already starting to see new legislation in Australia and Europe, and I believe the U.S. isn’t far behind. We need to work on anonymizing data for the good of society, and, furthermore, ensuring we have strong data governance that monitors how this data is being used. A big conversation over the past year was about the enterprise data cloud can help companies simplify their governance and management of data and AI in the cloud, so we’re now taking this one step further with ethical AI.

As we look to 2021, we will see the conversation of ethical AI and data governance be applied to multiple different areas, such as contact tracing (fighting COVID-19), connected vehicles and smart devices (who owns the data?), and personal cyber profiles (increased cyber footprint leading to privacy questions).”

AI ethics will become a board-level discussion in 2021, and hopefully kick off a new era of “responsible AI,” predicts Wilson Pang, the CTO of Appen.

Approaches to explainability will help open up black box models (amasterphotographer/Shutterstock)

“In 2021, as boards focus on closing the gap between AI’s potential benefits and the reality (only about 1 in 10 enterprises report obtaining “significant” financial benefits from AI – MIT Sloan), they will increasingly mandate AI governance programs that incorporate the principles of ‘responsible AI,’” Pang writes. “Responsible AI sets out standards and best practices for the responsible training of data with the aim to improve quality, efficiency and transparency – including the elimination of bias in training data – while promoting inclusivity and collaboration. Responsible AI practices also include paying annotators fair wages and adhering to labor wellness guidelines and standards. Greater board involvement in AI projects will increase adoption of these standards by the larger technology community, which in turn will increase the value of AI to businesses, as well as trust in the use of AI by the public.”

Many AI systems operate as black boxes that are opaque and hinder the ability of outside observers to understand how they work. That’s causing a problem, which hopefully will be addressed in 2021, according to Clayton Davis, the head of data science at Modzy.

“Explainability isn’t a panacea, but today, it is one way organizations are able to understand AI predictions and learn to understand and trust the technology,” Davis writes. “Explainability for AI systems is crucial for building an audit trail and establishing trustworthy, reliable, and responsible AI.”

In 2021, business leaders will take a critical look at AI with a new legal, moral and ethical lens, says Ed Macosky, head of product for Boomi.

“The last few months put a critical eye on AI and the unknown biases they could have, especially in social media,” Macosky writes. “The warning will have come through to tech leaders loud and clear. Next year we’ll see businesses taking a new lens when adapting AI, especially for items like hiring. But even something as innocuous as hyperautomation will be looked at through this lens, too. Leaders will act with this forethought to ensure customers and the business trust in the AI and the insights it provides.”

While many data professionals practice their art in an ethical manner, there’s no single standard defining what ethics look like in big data and AI. That may change in 2021, says Jeremy Levy, CEO of Indicative.

“I think that within the next year we will see progress toward a code of ethics within the data analytics space, led by conscious companies who recognize the seriousness of potential abuses,” Levy says. “Perhaps the U.S. government will intervene and pass some version of its own GDPR, but I believe that technology companies will lead this charge. What Facebook has done with engagement data is not illegal, but we’ve seen that it can have deleterious effects on child development and on our personal habits. In the coming years, we will look back on the way companies used personal data in the 2010s and cringe in the way we do when we see people smoking on a plane in films from the 1960s.”

Ethical AI will become a hot topic in 2021 but is a difficult problem to solve, says Talend CTO Krishna Tamman. Unfortunately, it’s a really, really hard problem to solve.

“Companies are using data and AI to create solutions, but they may be bypassing human rights in terms of discrimination, surveillance, transparency, privacy, security, freedom of expression, the right to work, and access to public services,” Tamman says. “To avoid increasing reputational, regulatory and legal risks, ethical AI is imperative and will eventually give way to AI policy. AI policy will ensure a high standard of transparency and protective measures for people. In the data sphere, CEOs and CTOs will need to find ways to eliminate bias in algorithms through careful analysis, vetting and programming.”

Related Items:

2021 Prediction from the Edge and IoT

Peering Into the Crystal Ball of Advanced Analytics

2021 Predictions: Data Science

Datanami