Follow Datanami:
January 10, 2022

The Upcoming Year in Big Data: A 2022 Preview


The world of big data is a never-ending roller coaster of new technologies, new techniques, and an ever-growing tsunami of data. As we roll into 2022, we turn to the community of big data practitioners and solution providers for insight into what trends might move the needle in the new year.

Data architectures are in a state of flux at the moment. According to Ravi Shankar, the SVP and CMO at data virtualization provider Denodo, the related concepts of data fabric and data mesh should be on your radar.

“In 2022, organizations will create a data fabric to drive enterprise-wide data and analytics and to automate many of the data exploration, ingestion, integration, and preparation tasks,” Shankar says. “By enabling organizations to choose their preferred tools, these data fabrics will reduce time-to-delivery and make it a preferred data management approach in the coming year.”

Similarly, the data mesh (which has similarities and differences in respect to data fabrics) become more enticing. “As organizations grow in size and complexity, central data teams are forced to deal with a wide array of functional units and associated data consumers,” Shankar says. “This makes it difficult to understand the data requirements for all cross functional teams and offer the right set of data products to their consumers. Data mesh is a new decentralized data architecture approach for data analytics that aims to remove bottlenecks and take data decisions closer to those who understand the data.”

If you’re building your big data architecture atop a data warehouse foundation, Tomer Shiran, founder and CPO of Dremio, would like a word with you.

“We hear it again and again: data warehouses are expensive, and costs are out of control. Newer technologies like data lakehouses will gain even more traction in 2022 because they have more to offer the enterprise than older data warehouse models that lock them in and drive up costs,” Shiran says.

Unidentified Aerial Phenomena (UAPs) as seen from the cockpit display of an F-18 (Department of Defense, US Navy)

The functionality of data lakes will get even easier to use in 2022, which will make them as easy to get started with as any data warehouse, Shiran continues. “Even non-technical workers will be able to easily get up and running on a data lake–thus eliminating the complexity and high costs of older data warehouse models,” he says. “As a result of these significant cost savings and lower barriers to entry, we can expect to see smaller companies and start-ups embrace this model, in addition to larger companies.”

Entity extraction is a well-known big data problem. In 2022, big data tech will let us extract more insight from the entities in our airspace, predicts Kinetica co-founder Amit Vij.

“This year’s US Intelligence report on UFOs was a landmark of transparency and insight into UAP/UFO sightings,” Vij writes. “However, it did not provide any definitive conclusions on the true nature of UAPs. That’s partly a function of the limitations of legacy technologies available. In 2022, thanks to projects like NORAD’s Pathfinder that are planned to go fully operational, we’ll start to gain a clearer picture of UAPs. These new capabilities enable tracking and classifying moving objects at significantly increased levels of sophistication based on AI and 1000X faster processing due to advances in parallel processing through vectorization. While there’s no guarantee of discovering aliens next year, governments and defense agencies will be able to demystify more sightings and share findings faster than before.”

The window of opportunity to act upon data gets smaller all the time. In 2022, the window is essentially zero, necessitating the adoption of real time analytics for certain use cases. The good news is it gets much easier in 2022, thanks to the democratization of real-time data, says Dhruba Borthakur, co-founder and CTO of Rockset.

“The fresher the data, the more valuable it is. Data-driven companies such as Doordash and Uber proved this by building industry-disrupting businesses on the backs of real-time analytics,” Borthakur writes. “Every other business is now feeling the pressure to take advantage of real-time data to provide instant, personalized customer service, automate operational decision making, or feed ML models with the freshest data. Businesses that provide their developers unfettered access to real-time data in 2022, without requiring them to be data engineering heroes, will leap ahead of laggards and reap the benefits.”

As the data continues to pile up, organizations’ analytics engines must run faster and longer to keep up. In the view of Sam Mahalingam, CTO of Altair, this necessitates a shift to continuous intelligence.

Real-time analysis techniques are growing in popularity (voyager624/Shutterstock)

“Businesses have more data and more data sources to handle than ever before. As manufacturers and other businesses are pushed to deliver new product ideas with greater efficiency, new data analytics models such as augmented analytics and continuous intelligence (CI) will be essential to ideation and critical thinking for advancement,” Mahalingam says. “For instance, with CI, real-time analytics are integrated into business operations, enabling users to get the most out of their data. Since CI exists in a ‘frictionless state,’ businesses can leverage these continuous, AI-driven insights based on automated calculations and specific recommendations to make actionable, forward-thinking decisions, right as data events unfold. This more accurate information model benefits those business areas that need timely response, including supply chain, fraud detection, customer experience, and IoT-enabled manufacturing.”

Another backer of the data fabric approach is Stefan Sigg, the chief product officer at Software AG.

“Data management challenges will not go away in 2022, so enterprises will need to build and embrace data fabric architectures for agility and dynamic decision-making,” Sigg says. “Instead of simply sending data down a road to be stored, scaled or analyzed, a data fabric is able to direct data into a holding area so it can be used while it’s most relevant. With big data supporting the business goals of 72% of organizations, proper implementation of a data fabric is a natural evolution that helps companies to be more informed, more quickly.”

Concerned that cloud-based data warehouses are re-creating the lock-in that plagued on-prem kits? So is Dipti Borkar, the co-founder and Chief Product Officer at Presto provider Ahana. The solution, she says, is the “OpenFlake,” or the Open Data Lake for Warehouse Workloads

“Data warehouses like Snowflake are the new Teradata with proprietary formats,” Borkar writes. “2022 will be about the Open Data Lake Analytics stack that allows for open formats, open source, open cloud, and no vendor lock-in.”

Data may exist in silos, but that doesn’t mean data access should be narrow, says Hammerspace CEO David Flynn, who predicts that data will become more of a globally accessible resource in 2022.

“Work from home will frequently be defined not just as work from your house near the office, but to, work from your home country,” Flynn writes. “Data will need to be a globally accessible resource to the workforce while remaining high-performance locally to take advantage of the high-efficiency, special purpose hardware.”

Do cloud data warehouses recreate the lock-in of on-prem systems? (ramcreations/Shutterstock)

In 2022, data availability will become more of a priority, says Betsy Doughty, the vice president of corporate marketing for Spectra Logic.

“With remote working on the rise, the ability to ensure data availability at any location at any time is becoming increasingly important,” she writes. “Organizations will continue to explore how best to integrate cloud into their IT strategies to enable low latency data availability in 2022.”

2022 will be the year where businesses create new pipelines and analytics for new business processes, predicts Octopai’s CEO Yael Ben Arie.

“For instance, some financial companies are coming up with new systems to keep up with the infusion of data they get regularly and balance that with the need for regulatory compliance,” Ben Arie writes. “We should also expect to see the empowerment of any corporate employee, even non-BI analysts accessing and leveraging data. There will be a big push to enable self-service data users, or regular employees to access data at their will to drive individual business decisions.”

We’re on the cusp of moving beyond “narrow AI” to unlock decision intelligence, which Peak’s Co-Founder and CEO Richard Potter believes “is the most important B2B movement of a generation.”

“But to solve businesses biggest challenges, AI needs to be focused on an outcome, on delivering against business objectives and driving tangible results,” Potter writes. “Businesses that make great decisions consistently win. Which is why decision intelligence, the commercial application of AI to the decision-making process, is how the vast majority of businesses will adopt AI.”

We are embarking upon the era of Database 3.0, which will see a great consolidation of data stores, predicts Raj Verma, the CEO SingleStore

“The first generation of databases were the Oracles and Informix and DB2. The second was this database sprawl where you saw the influx of DB2, Couchbase went public, and the other 300,” Verma writes. “The next generation of databases is the consolidation of these data platforms and types into a database that can handle modern data, and do it in a hybrid, multi-cloud manner with extremely low latency.”

One of the characteristics of the Database 3.0 era will be greater data intensity and data complexity. These will become measures of an organization’s digital dexterity, but they also must be kept in check, says Oliver Schabenberger, SingleStore’s Chief Innovation Officer.

Interest in DataOps is building (Good_Stock/Shutterstock)

“Data intensity increases naturally as more constraints are connected to the data: variety, volume or velocity, geographic distribution, diverse types and structure, diverse use cases, automation privacy, security, number of producers and consumers,” Schabenberger writes. “Data intensity is positive, but if not properly managed will lead to complexity that adds cost and friction. While data intensity today is mostly an attribute of applications, I predict that by 2024 the majority of organizations will have objectives, key results and KPIs tied to data intensity to capture their digital maturity.”

Organizations have no shortage of data. But putting the data glut to use will require disciplined DataOps practitioners, says BMC’s Chief Product Officer, Ali Siddiqui.

“In 2022 DataOps will become an effective way to manage and integrate organizational data so enterprises can adjust, respond, predict, and act autonomously based on applicable scenarios,” Siddiqui writes. “By implementing DataOps processes and creating teams of dedicated data engineers and scientists, organizations can gain insights that drive decision making in near real-time and become autonomous digital enterprises.”

Do you trust your data? If you answered “yes,” it is likely with some caveats. But things are set to improve, according to Susan Cook, CEO of Zaloni, who says better days are ahead in terms of solutions that will increase our faith and trust in data.

We’ve only seen the tip of the iceberg of technology solutions that are truly able to handle data accuracy and relevancy. In 2022, we will leverage machine learning and automation more fully to manage, govern and improve data. Once we do that, enterprises will have more trust and faith that they have good quality data, which will result in much faster and better decisions.

The cross-fertilization of data tools generally benefits all users. In 2022, we’ll see data engineers adopting tools originally developed for data scientists, says Matthew Halliday, co-founder and executive vice president of product at Incorta.

“Data engineers will increasingly use AI-based tools in their day-to-day work,” Halliday writes. “To support this, more analytics vendors will incorporate AI programmatic capabilities in their platforms, opening up new opportunities for data engineers. This will also blur the line between data engineering and data science, providing new opportunities for innovation.”

Heraclitus would have fit right into today’s fast-changing world (Naci-Yavus/Shutterstock)

Think data was dynamic and fast-changing last year? You haven’t seen anything yet, says Lenley Hensarling, chief strategy officer at Aerospike.

“As Greek philosopher Heraclitus once said, ‘There is nothing permanent except change,’” Hensarling says. “A big change in 2022 will be–change. Data will change faster and more frequently than ever before. It will no longer be acceptable to analyze massive amounts of static data once per month, once per week, or even once per day. Organizations will need to glean insights from streaming data in real time to find new patterns and discover and act on them. Navigating data is like running whitewater, where you need to adapt instantly to a changing environment. Those that learn to run the rapids will succeed.”

In 2022, the focus of data management will shift from the mechanics of moving and storing information to enabling business outcomes, says Krishna Tammana, CTO, Talend.

“As intelligent automation continues to transform the way businesses operate, organizations are starting to realize that AI/ML is only as good as the data they feed into it,” Tammana writes. “If businesses can ensure healthy data at scale and at the speed of business, they will be able to truly unlock the power of data analytics and deliver successful business outcomes.”

Related Items:

Security, Privacy, and Governance at the Data Crossroads in ‘22

Data Science and AI Predictions for 2022

2022 Big Data Predictions from the Cloud