April 12, 2021

Why AI Is Failing for Enterprises: Predetermination Bias

Jed Yueh

(Lightspring/Shutterstock)

Our best practices for managing data are ancient. Literally.

For tens of thousands of years we’ve managed the future by predetermining the resources we think we’ll need, limiting our futures to what we can foresee. We’re now in the age of AI with a radically more open future. AI–especially machine learning–can learn from seemingly insignificant data, often to our human delight and surprise.

So, it’s time to rethink our processes and mindset when it comes to data. We need to stop limiting data to the futures we expect. That’s not easy because predetermination is a deep habit that guides much of our lives. For instance, we fill our kitchen pantries with the ingredients we think we need to save space, time, and money.

But that predetermination inhibits innovation. We’re unlikely to try, say, a new Indonesian recipe if we’re missing kaffir lime leaves.

Worse, our imagination itself is muted by our awareness of the supplies on hand. This is an even more pernicious type of predetermination because we often don’t recognize that it’s happening.

Call it predetermination bias.

Culling Bias

Predetermination bias is often hardcoded into the ETL (extract, transform, load) pipelines enterprises use to manage data today. Typically only a small stream of application data, about 5% to 10% of the total, makes it through the pipeline and lands in a data warehouse for analysis.

(Andrii Yalanskyi/Shutterstock)

Data lakes and NoSQL data stores improve upon the process by switching ETL to ELT. That’s a good start, but it’s not enough. Extracting data leaves most of the data behind. Loading results in yet another copy of data that keeps getting bigger and bigger. And transforming data still strips off what is unique in order to standardize it.

Data warehouses, data lakes, and NoSQL stores are useful, just like kitchen pantries. But they reduce the total available information.

In the Age of Machine Learning, we’re learning the importance of what might seem like insignificant data. Machine learning neural networks compute data weights from webs of correlations often too complex for our minds to unravel. The amount and complexity of the data they process is why machines can now beat the world’s human champions at chess and can diagnose diseases from data in ways that doctors and computer scientists cannot understand.

Then there’s the lesson from the tech giants. When your applications service billions of users, you have to rely on algorithms to make immediate judgments based on as much data as possible.

Now it’s time to end data predetermination for the rest of the world’s businesses. We need a new model for thinking about and acting on data.

Leave No Data Behind

If predetermination, ETL, and ELT are passé, then what does a modern data process look like?

Here are seven guidelines for modern data management:

Access to all the primary data: Most of the valuable data in an enterprise sits in enterprise applications, from mainframe to cloud native. Instead of settling for a thin stream of ETL or ELT data, we need access to all of the primary data across multi-generational platforms. And we need an efficient way to access it that does not impact or add risk to production applications.

(Semisatch/Shutterstock)

API-driven access: Scurrying from department to department to learn the magical phrases to get data access takes too long. In today’s world, we need APIs (application programming interfaces) that provide simple, uniform ways to request data from all sources.

Data privacy and compliance: Regulatory compliance cannot be an afterthought. Today, companies must pursue responsible innovation by securing data used in analytics and training AI models. Enterprise data needs to be masked to keep personally identifiable and other sensitive information from reaching the wrong hands.

Data history and integrity: Data changes quickly over time, and data relationships matter. When feeding data from different sources into machine learning, it’s critical to make sure all the data comes from precisely the same time to preserve relationship integrity. In addition, historical data can be used to iteratively train and test machine learning models to tune and improve outcomes.

Version control: The world continues to change, so models that work today may fail tomorrow. That means we need version control—access to the source data used to train the failing models so we can perform drift analysis so we can see what has changed in the data to properly retune and retrain our models.

Automation: While machine learning is sophisticated and high tech, most of the daily work is manual and prosaic—data wrangling, preparation, cleanup, and separating datasets for training, testing, and validation. All of these operations add friction to the process. Automation overcomes that friction, enabling faster and more effective use of data.

Data anywhere: Today, enterprise applications live across the muli-cloud—SaaS, private clouds, and public clouds. And cloud vendors are constantly evolving and competing on AI technology offerings. So it’s critical for companies to be able sync compliant data wherever they need to best process data for strategic advantage.

These seven guidelines don’t spell the end of the data warehouse. There are times when we know exactly the data we need now and in the future. In those situations, predetermination still holds value.

But in the land of tech giants and AI, we need a new model to keep up with the data and the times.

About the author: Jedidiah Yueh has led two waves of disruption in data management, first as founding CEO of Avamar (sold to EMC in 2006 for $165M), which pioneered data de-duplication and shipped one of the leading products in data backup and recovery, with over 20,000 customers and $5B in cumulative sales. After Avamar, Jed founded Delphix, which provides an API-first data platform to accelerate digital transformation for over 25% of the Global 100 and has surpassed $100 million in ARR. In 2013, the San Francisco Business Times named Jed CEO of the Year. Jed is the bestselling author of Disrupt or Die, a book that refutes conventional ideas on innovation with proven frameworks from Silicon Valley. After being designated a US Presidential Scholar by George H. Bush, Jed graduated Phi Beta Kappa, magna cum laude from Harvard, while working three jobs, including teaching at a local high school.

Three Ways Biased Data Can Ruin Your ML Models

Automating the Pain Out of Big Data Transformation

Applications: Artificial Intelligence, Data Mining

Vendors: Delphix

Tags: ELT, ETL, machine learning, predetermination bias

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

Why AI Is Failing for Enterprises: Predetermination Bias

Culling Bias

Leave No Data Behind

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 25, 2024

April 24, 2024

April 23, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

Why AI Is Failing for Enterprises: Predetermination Bias

Culling Bias

Leave No Data Behind

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 25, 2024

April 24, 2024

April 23, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link