In addition to the explosion of data volumes, many organizations are struggling with an explosion in the number of data sources and data silos. Managing data in this fluid, ever-changing environment is a major challenge for would-be data-driven organizations, but one pattern that offers potential salvation for the stressed data architect is the data fabric.
Data fabrics aren’t new. We’ve been writing about them for several years here at Datanami. In the early days, the definition of a data fabric was a bit loose. But lately, it’s begun to harden and the core elements of a data fabric have coalesced into a configuration that’s finding traction in the real world.
Forrester analyst Noel Yuhanna was one of the early proponents of the data fabric. In the latest Forrester Wave: Enterprise Data Fabric, Q2 2022, Yuhanna dived into the benefits of the data fabric and dissected the offerings of 15 data fabric vendors.
“Today, delayed insights can have a devastating effect on a firm’s ability to win, serve, and retain customers,” Yuhanna wrote in the Wave report. “Organizations want real-time, consistent, connected, and trusted data to support their critical business operations and insights. However, new data sources, slow data movement between platforms, rigid data transformation workflows and governance rules, expanding data volume, and distributed data across clouds and on-premises, can cause organizations to fail when executing their data strategy.”
Centralizing all data in a data lake such as Hadoop or Amazon S3 was supposed to solve many of these problems, but it hasn’t worked out that way. Not every piece of data belongs in lakes, thanks to bandwidth and storage costs as well as sheer practicality. Technological progress also continues to churn out new digital innovations, and people are more than happy to try them out, which typically results in yet another data silo.
Data silos appear to be permanent houseguests. Just as Edwin Hubble’s raisin pudding analogy held that the expansion of the universe makes matter grow farther apart, the big data boom seems to be causing data repositories to drift further apart even as the overall volume of data continues expanding at a geometric rate. The data fabric is a way to layer some connective tissue among those sweet, sweet nuggets of data.
As Yuhanna wrote:
“Data fabric delivers a unified, integrated, and intelligent end-to-end data platform to support new and emerging use cases,” he continued. “It automates all data management functions–including ingestion, transformation, orchestration, governance, security, preparation, quality, and curation–enabling insights and analytics to accelerate use cases quickly.”
Data fabrics are essentially pre-integrated super-suites of data management tools. Instead of cobbling together separate products for handling the data functions that Yuhanna mentioned above (not to mention data catalogs), data fabrics deliver these functions through a single product, providing consistency and repeatability to big data management processes, which helps breeds trust in data and the analytics that come from it.
Yuhanna sees a lot of data fabrics being deployed in cloud and hybrid cloud environments at the moment, particularly in support of applications like customer 360, business 360, fraud detection, IoT analytics, and real-time insights. Data fabrics are being deployed across multiple industries, including financial services, retail, healthcare, manufacturing, oil and gas, and energy, he wrote.
Data fabrics are also being deployed in the life sciences industry, where they can help knit disparate data silos into a seamless whole. One life sciences company that’s betting big on data fabrics is eClinical Solutions, a Massachusetts-based provider of software for running clinical trials.
In the past, clinical trials might have involved three or four disparate data sources, according to Raj Indupuri, eClinical Solutions’ CEO.
“But now with research we end up for every trial, every trial might have 15+ different sources, whichi means different streams of data, different structures, different formats, and different systems,” Indupuri said. “So the problem in terms of data chaos–we refer to this as data chaos–has only exploded or increased.”
In Indupuri’s view, the data fabric is a natural evolution of the data lake, or the lakehouse. These flexible data repositories are able to ingest and store just about any type of data, giving customers or stakeholders the ability to transform, prepare, and analyze the data when they need to. But when data spans multiple data lakes (or warehouses or lakehouses), that is where data fabrics play an important role.
“One big difference would be, instead of having everything in one centralized location, with the data fabric, that is how do you actually combine different stores,” he told Datanami in a recent interview. “They could be distributed. But on top we have a fabric so that with governance and with other capabilities, we’re able to deliver analytics to end stakeholders efficiently, to deliver it to downstream to different stakeholders in different systems.”
eClinical Solutions has already build some components of a data fabric solution into its offering. It has built an end-to-end data pipeline in AWS that automatically extracts metadata and catalogs it when a new piece of data lands in the system, according to Indupuri. The company’s solution also includes a data management workbench where data managers can review and clean data.
“We evolved significantly over a decade or so,” he said. “When we first started, it was kind of a report. Then we evolved into a data lake type of architecture, where you can stage any data, regardless of the source. Then we have embedded capabilities where it’s metadata driven, and you can actually transform and publish data marts within our data cloud.”
Where it gets tricky is dealing with the data repositories of eClinical Solutions’ own customers, who are drug companies or companies doing drug exploration. These customers often have separate data lakes for clinical research, for operational data, for safety data, and for regulatory data, and are loathe to move or copy data between them.
“You can actually enable them to access data across these data stores, or these distributed data clouds or data lakes or data warehouses,” Indupuri said. “So that’s where data fabric can help.”
Related Items:
Data Mesh Vs. Data Fabric: Understanding the Differences
Data Fabrics Emerge to Soothe Cloud Data Management Nightmares
Big Data Fabrics Emerge to Ease Hadoop Pain
May 6, 2024
May 3, 2024
- IDC: Structured Database Workloads Drive Major IT Infrastructure Spending in Late 2023
- Oracle Database 23ai Brings the Power of AI to Enterprise Data and Applications
- Solodev Debuts Advanced AI Services to Boost Cloud-Based App Development
- Sigma Announces AI Toolkit for Business and Launches Sigma Actions to Support Custom Data Applications
- New Salesforce White Paper Tackles LLM Security Risks
May 2, 2024
- Informatica to Welcome Data and AI Visionaries at Informatica World 2024
- Tealium for AI Launches to Address Data Readiness and Governance Challenges in AI Projects
- Eviden to Launch Applied AI and Edge AI Training Center in Collaboration with NVIDIA and UMBC Training Centers
- Airbyte Introduces Comprehensive Partner Program to Enhance Data Movement Services
- Dremio Enhances SQL Engine Capabilities for Optimal Performance in Data Analytics
- Galileo Introduces Protect, a Real-Time Hallucination Firewall to Safeguard Enterprise Generative AI
- LakeChime: A Data Trigger Service for Modern Data Lakes
- Confluent Unveils New Capabilities to Apache Flink Including Freight Clusters for Cost-Effective Data Handling
- Satellogic Releases Open Dataset for AI Model Training
- Tableau and Databricks Expand Strategic Partnership
May 1, 2024
- Quantum Corporation Launches Quantum GO with New Subscription Model for Data Management
- Exovera Enhances exoINSIGHT with Advanced Data Exploration and Collection
- Predactiv Unveils New Platform for Enhanced Data Integration and Analysis
- Deloitte and Google Public Sector Introduce New GenAI Platform for Smarter Government Solutions
Most Read Features
Sorry. No data so far.
Most Read News In Brief
Sorry. No data so far.
Most Read This Just In
Sorry. No data so far.
Sponsored Partner Content
-
Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!
-
Supercharge Your Data Lake with Spark 3.3
-
Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]
-
Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]
-
Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023
-
The Art of Mastering Data Quality for AI and Analytics
Sponsored Whitepapers
Contributors
Featured Events
-
AI & Big Data Expo North America 2024
June 5 - June 6Santa Clara CA United States -
CDAO Canada Public Sector 2024
June 18 - June 19 -
AI Hardware & Edge AI Summit Europe
June 18 - June 19London United Kingdom -
AI Hardware & Edge AI Summit 2024
September 10 - September 12San Jose CA United States -
CDAO Government 2024
September 18 - September 19Washington DC United States