June 19, 2018

Why Relying on Data Hygienists Is Not a Realistic Way to Ensure Bias-Free AI

Roman Stanek

(Lagarto Film/Shutterstock)

There has been a lot of buzz lately around the new career paths that will crop up as artificial intelligence (AI) systems become more commonplace. In fact, Gartner predicts that, though AI will automate the jobs of 1.8 million people by 2020, AI will create 2.3 million jobs by that same year—a net increase of 500,000 jobs.

This is to be expected; AI will introduce numerous new priorities and requirements that we can only begin to speculate about now. These positions could include personality trainers for AI and algorithms forensics analysts, but one in particular that intrigues me is a data hygienist.

This is still a relatively new term—even Google wants to redirect you to results for dental hygienists. So what is it? Data hygienists will ensure that the data used to train AI systems is “free from any slanted perspective.” It requires someone to monitor and adjust the data that the machine is fed, so it processes only the purest, most accurate data possible. This sounds promising, but there is frankly no way that humans are capable of erasing their own biases to be useful to a machine.

At first glance, this seems like a reasonable role to rely on as the use of AI systems spreads within organizations. By nature, machines are objective. They take data in, process it, and determine the best course of action. In this respect, data quality is important, because the machine is only as good as the data that it has been provided. It makes sense to want to introduce a human element that can confirm that the data being used is the best available and is free of errors.

The Cognitive Bias Codex (Designed by John Manoogian III)

My point arises when we start to consider the issue of bias. The chances that a human—or even a machine for that matter—could operate with no bias whatsoever are basically zero.

In my office, I have a poster that displays the 188 known cognitive biases. I refer to this poster often to remind myself that we all may react a certain way or have motivations that are driven by biases that we may not even be aware that we have. These go beyond the racial, gender, or age biases that we are all so familiar with. They range from the backfire effect—where someone doubles down on their beliefs when they’re confronted with information that contradicts them—to anchoring, which is the tendency we all have to rely too heavily on one piece of information—usually the first piece that we encounter—when making a decision. What are the chances that a data hygienist will assess data without using even one of those 188 biases?

First, there’s no way a data hygienist will ever be completely free of human biases because there are simply too many that are so ingrained in the way we make decisions or process information. Second, it’s impossible for humans to interact with an AI system in a way that doesn’t pass some of our own biases on to the machine, regardless of the models we train or algorithms we build, test, and retest. The algorithm was created by someone who, by definition, has their own biases, which are present in models—even if they were never intended to be there.

Instead of saying that a data hygienist will help us ensure that AI is free of bias or that AI is free of bias in its own right, we need to instead be aware of the sheer number of biases that exist. We need to recognize that AI systems are built and tested by people who unintentionally teach them to respond to something in the same way that our own biases lead us to respond.

In the end, data is data. It’s tempting to say that it is untarnished by the biases that dictate how we interact with each other and how we process information, but this ignores the subtle ways in which biases work. With so many, it is impossible to get rid of them all. Rather, we all need to work to minimize the effects of our biases and to understand that, however careful we may be, our AI systems and our data will bear the mark of our human inclinations.

About the author: Roman Stanek is the CEO of GoodData, which he founded in 2007 with the mission to disrupt the business intelligence space and monetize big data. Prior to GoodData, Roman was founder and CEO of NetBeans, the leading Java development environment (acquired by Sun Microsystems in 1999) and Systinet, a leading SOA governance platform (acquired by Mercury Interactive, later Hewlett Packard, in 2006).

Related Items:

Are Our Expectations For AI Too High?

Four Mandates to Turn Your Underutilized Data Into Revenue

Applications: Artificial Intelligence

Technologies: Frameworks

Sectors: Financial Services

Vendors: GoodData

Tags: AI, data hygenist

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

Why Relying on Data Hygienists Is Not a Realistic Way to Ensure Bias-Free AI

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 25, 2024

April 24, 2024

April 23, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

CDAO Canada Public Sector 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

Why Relying on Data Hygienists Is Not a Realistic Way to Ensure Bias-Free AI

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 25, 2024

April 24, 2024

April 23, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link