Follow Datanami:
December 13, 2023

Fourteen Big Data Predictions for 2024

(winyuu/Shutterstock)

Never before have the challenges of big data–how we store it, manage it, govern it, and use it–been so pressing. Advances in artificial intelligence may be the driving force in 2024, but that doesn’t mean a thing if your big data is out of control.

What will big data bring us in the new year? It’s anybody’s guess, really, as the future has proven difficult to predict in the past. For a big data forecast, we look to industry experts for insight.

Dave Stokes, a technology evangelist at database provider Percona, says there will be spike in interest in vector databases. However, it won’t last a full trip around the sun.

“Vector databases will be the hot new area for discussion by many but will eventually be absorbed by relational databases after a few years,” Stokes predicts. “Every 10 or so years a ‘new’ database technology is proclaimed to be the end of relational databases, and developers jump on that bandwagon only to rediscover that the relational model is extremely flexible and relational database vendors can easily adapt new technologies into their products.

The existence of disparate data silos has been a persistent thorn in the side of data engineers. But according to Hammerspace’s SVP of Marketing Molly Presley, 2024 will bring a glimpse of hope as a centralized form of data orchestration takes center stage.

“Organizations will start moving away from ‘store and copy’ to a world of data orchestration,” Presley says. “Driven by AI advancements, robust tools now exist to analyze data and tease out actionable insights. However, file storage infrastructure has not kept pace with these advancements. Unlike solutions that try to manage storage silos and distributed environments by moving file copies from one place to another, data orchestration helps organizations integrate data into a single namespace from different silos and locations and automates the placement of data when and where it’s most valuable, making it easier to analyze and derive insights.”

Most of the data that we store is of the unstructured variety. As it piles up, it becomes a real challenge to manage, but 2024 will bring new ways to manage it all, says Anand Babu “AB” Periasamy, co-founder and CEO at MinIO.

(Tee11/Shutterstock)

“In 2024, we’ll see an enterprise explosion of truly unstructured data (audio, video, meeting recordings, talks, presentations) as AI applications take flight. This is highly ‘learnable’ content from an AI perspective and gathering it into the AI data lake will greatly enhance the intelligence capacity of the enterprise as a whole, but it also comes with unique challenges,” Periasamy says. “There are distinct challenges with maintaining performance at tens of petabytes. Those generally cannot be solved with traditional SAN/NAS solutions–they require the attributes of a modern, highly performant object store. This is why most of the AI/ML technologies (I.e. OpenAI, Anthropic, Kubeflow), leverage object stores and why most databases are moving to be object storage centric.”

According to Forrester, unstructured data that’s managed by enterprises will double in 2024, opening up potentially lucrative new options for AI.

“Global data and analytics decision-makers say only 27% of their organizations’ managed data is unstructured,” the analyst group says. “Generative AI will double that as companies roll out more conversational experiences for customers and employees. Enterprises will scramble to store, analyze, and make sense of this deluge of unstructured data. This trend will show up in the data pipeline space, where 80% of new data pipelines built in 2024 will be for ingesting, processing, and storing unstructured data.

In 2024, many enterprise around the world will implement a data-first architecture to simplify their data management strategies, says Jeff Heller, vice president of technology and operations at Faction, Inc.

“Companies are going through a paradigm shift; they either choose one cloud or over architect to meet their needs,” Heller said. “In 2024, organizations will need to look at what kind of cloud works best for them to make the most of their data. Decisions being made based on short-term goals and not long-term growth, will lead to a data lock up. Data needs to be accurate and accessible to make timely decisions. Managing data is becoming more intricate for organizations. The need for an efficient data management strategy is paramount. Enterprises will turn to solutions that offer access to a single dataset from a preferred location across all clouds, ensuring data accuracy and increased efficiency.”

The AI revolution is touching all aspects of life, including big data management, according to Ciaran Dynes, the chief product officer for data pipeline shop Matillion.

(Gorodenkoff/Shutterstock)

“The role of the data engineer has radically expanded over the past decade,” Dynes says. “The next 12 months will be the year that tech companies make life simpler for data engineers. Tools will come to market, be integrated into existing platforms to enable adding generative AI to existing data pipelines with the ability to deploy these models internally so that users can interact live with these models just like they already do with ChatGPT. Regardless of the tools that come to market, the next year will also see huge demand for data engineers to retrain to master prompt engineering, how to fine tune these models, how to massively increase their productivity. The next year will see data engineers’ lives get so much more interesting.”

How much do you value data engineers? According to Jeff Hollan, director of product management for Snowflake, you’re going to value them even more in 2024.

“There’s been a lot of chatter that the AI revolution will replace the role of data engineers,” Hollan says. “That’s not the case, and in fact their data expertise will be more critical than ever–just in new and different ways. To keep up with the evolving landscape, data engineers will need to understand how generative AI adds value. The data pipelines built and managed by data engineers will be perhaps the first place to connect with large language models for organizations to unlock value. Data engineers will be the ones who understand how to consume a model and plug it into a data pipeline to automate the extraction of value. They will also be expected to oversee and understand the AI work.”

You might feel as though your data is out of control when it’s being managed by a third-party in the cloud. 2024 will be the year you start to take back control of your data, predicts Peter Shafton, the CTO of Ngrok.

“Data management in 2024 will significantly shift towards greater accessibility and control,” Shafton says. “While the past decade witnessed a rush towards cloud-based data solutions, the pendulum is swinging back towards more self-management. The reasons behind this shift are twofold: privacy and cost-effectiveness. The constant threat of data breaches and the need for more stringent access control have made businesses wary of relying solely on external cloud platforms. Additionally, the unpredictability of cloud data storage and processing costs has led organizations to seek more predictable and cost-effective solutions. This trend is also facilitated by a proliferation of accessible and user-friendly data management tools, often originating from open-source solutions pioneered by tech giants like Uber, Netflix, and Airbnb.

The term “data intelligence” has been growing for a few years to refer to the assortment of data management tools organizations bring to bear on their data. The next 12 months will be make-or-break for the concept, says Jim Liddle, the chief innovation officer at Nasuni.

(greenbutterfly/Shutterstock)

“A shocking number of companies store massive volumes of data simply because they don’t know what’s in it or whether they need it,” Liddle says. “Is the data accurate and up-to-date? Is it properly classified and ‘searchable’? Is it compliant? Does it contain personal identifiable information (PII), protected health information (PHI), or other sensitive information? Is it available on-demand or archived? In the coming year, companies across the board will be forced to come to terms with the data quality, governance, access, and storage requirements of AI before they can move forward with digital transformation or improvement programs to give them the desired competitive edge.”

Fail to maintain the quality and integrity of your data, and you can kiss your 2024 GenAI plans goodbye, says Armon Petrossian, CEO and co-founder of Coalesce.

“In 2024, the technology landscape will witness a transformative shift as data evolves from being a valuable asset to the lifeblood of thriving enterprises,” he says. “Organizations that overlook data quality, integrity, and lineage will be challenged to not only make informed decisions but also realize the full potential of generative AI, LLM and ML applications and use cases. As the year unfolds, I predict that organizations neglecting to craft robust data foundations and strategies will find it increasingly challenging to stay afloat in the swiftly evolving tech industry. Those who fail to adapt and prioritize data fundamentals will struggle to outpace their competitors and may even risk survival in this highly competitive environment.”

Data lineage poses a persistent challenge. In 2024, blockchain will come to the rescue, predicts Yeshwant Mummaneni, the chief engineer for cloud at Altair.

“As AI/ML models play key roles in critical decision-making, whether supervised by humans or in a completely autonomous fashion, model provenance/lineage becomes crucial,” Mummaneni says. “The foundational technology that powered blockchain to provide immutability of records, digital identities, signatures, and verifications leveraging cryptography will become a key aspect of enterprise AI to provide tamper proof model provenance.”

Another big data trend that will be growing like ice crystals on a cold winter night in 2024: synthetic data. That’s to Spiros Potamitis, a senior analytics product manager at SAS.

(VideoFlow/Shutterstock)

“Synthetic data will get a lot of traction as organizations face tighter regulations and sharing sensitive data across borders becomes more challenging,” Potamitis says. “Synthetic data can capture the statistical properties of the original data source with high accuracy to overcome regulatory barriers and unlock innovation for organizations.”

While your big data repository feels right, 2024 will be the year that data governance “shifts left,” according to ALTR CEO James Beecham.

“Organizations will implement data governance and security measures earlier in the data journey, to the left of a cloud data warehouse, which will not only protect sensitive information, but will also improve the overall quality of the data collected,” Beecham says. “With the increasing number of regulations regarding data privacy and security, companies that prioritize data governance and security early on will be better equipped to comply with these regulations. In 2024, expect to see a surge of companies prioritizing shift left data governance and security – allowing them to initiate strong data access governance and data security capabilities available on cloud data warehouses and lake houses and extending them back to the data as it leaves source systems.”

Data mesh kind of took a back seat to other tech trends in 2023 (we’re looking at you, GenAI), But in 2024, data mesh’s benefits will become too obvious to ignore, says Angel Viña, the CEO of Denodo.

“2024 will be a pivotal year for the ascent of data mesh, which embraces the inherently distributed nature of data,”  Viña says. “In a data mesh, the role of IT shifts to providing the foundation for data domains to do their work, i.e., the creation and distribution of data products throughout the enterprise.  The turning point will be the realization that data products should be treated with the same level of importance as any other product offering….In this data-centric era, it is not enough to merely package data attractively; organizations need to enhance the entire end-user experience.”

Related Items:

Unleash the 2023 Big Data Predictions!

Big Things Ahead for AI in 2023: Predictions

Analytics Predictions for 2023

Datanami