Data Fabric Brings Data Together for Timely Decisions
In addition to the explosion of data volumes, many organizations are struggling with an explosion in the number of data sources and data silos. Managing data in this fluid, ever-changing environment is a major challenge for would-be data-driven organizations, but one pattern that offers potential salvation for the stressed data architect is the data fabric.
Data fabrics aren’t new. We’ve been writing about them for several years here at Datanami. In the early days, the definition of a data fabric was a bit loose. But lately, it’s begun to harden and the core elements of a data fabric have coalesced into a configuration that’s finding traction in the real world.
Forrester analyst Noel Yuhanna was one of the early proponents of the data fabric. In the latest Forrester Wave: Enterprise Data Fabric, Q2 2022, Yuhanna dived into the benefits of the data fabric and dissected the offerings of 15 data fabric vendors.
“Today, delayed insights can have a devastating effect on a firm’s ability to win, serve, and retain customers,” Yuhanna wrote in the Wave report. “Organizations want real-time, consistent, connected, and trusted data to support their critical business operations and insights. However, new data sources, slow data movement between platforms, rigid data transformation workflows and governance rules, expanding data volume, and distributed data across clouds and on-premises, can cause organizations to fail when executing their data strategy.”
Centralizing all data in a data lake such as Hadoop or Amazon S3 was supposed to solve many of these problems, but it hasn’t worked out that way. Not every piece of data belongs in lakes, thanks to bandwidth and storage costs as well as sheer practicality. Technological progress also continues to churn out new digital innovations, and people are more than happy to try them out, which typically results in yet another data silo.
Data silos appear to be permanent houseguests. Just as Edwin Hubble’s raisin pudding analogy held that the expansion of the universe makes matter grow farther apart, the big data boom seems to be causing data repositories to drift further apart even as the overall volume of data continues expanding at a geometric rate. The data fabric is a way to layer some connective tissue among those sweet, sweet nuggets of data.
As Yuhanna wrote:
“Data fabric delivers a unified, integrated, and intelligent end-to-end data platform to support new and emerging use cases,” he continued. “It automates all data management functions–including ingestion, transformation, orchestration, governance, security, preparation, quality, and curation–enabling insights and analytics to accelerate use cases quickly.”
Data fabrics are essentially pre-integrated super-suites of data management tools. Instead of cobbling together separate products for handling the data functions that Yuhanna mentioned above (not to mention data catalogs), data fabrics deliver these functions through a single product, providing consistency and repeatability to big data management processes, which helps breeds trust in data and the analytics that come from it.
Yuhanna sees a lot of data fabrics being deployed in cloud and hybrid cloud environments at the moment, particularly in support of applications like customer 360, business 360, fraud detection, IoT analytics, and real-time insights. Data fabrics are being deployed across multiple industries, including financial services, retail, healthcare, manufacturing, oil and gas, and energy, he wrote.
Data fabrics are also being deployed in the life sciences industry, where they can help knit disparate data silos into a seamless whole. One life sciences company that’s betting big on data fabrics is eClinical Solutions, a Massachusetts-based provider of software for running clinical trials.
“But now with research we end up for every trial, you might be having 15+ different sources, different streams of data, different structures, different formats, different systems,” Indupuri said. “So the problem in terms of data chaos–we refer to this as data chaos–has only exploded or increased.”
In Indupuri’s view, the data fabric is a natural evolution of the data lake, or the lakehouse. These flexible data repositories are able to ingest and store just about any type of data, giving customers or stakeholders the ability to transform, prepare, and analyze the data when they need to. But when data spans multiple data lakes (or warehouses or lakehouses), that is where data fabrics play an important role.
“One big difference would be, instead of having everything in one centralized location, with the data fabric, that is how do you actually combine different stores,” he told Datanami in a recent interview. “They could be distributed. But on top we have a fabric so that with governance and with other capabilities, we’re able to deliver analytics to end stakeholders efficiently, to deliver it to downstream to different stakeholders in different systems.”
eClinical Solutions has already build some components of a data fabric solution into its offering. It has built an end-to-end data pipeline in AWS that automatically extracts metadata and catalogs it when a new piece of data lands in the system, according to Indupuri. The company’s solution also includes a data management workbench where data managers can review and clean data.
“We evolved significantly over a decade or so,” he said. “When we first started, it was kind of a report. Then we evolved into a data lake kind of an arch cure, where you can stage any data, regardless of the source. Then we have embedded capabilities where it’s metadata driven, you can actually transform and publish data marts within our data cloud.”
Where it gets tricky is dealing with the data repositories of eClinical Solutions’ own customers, who are drug companies or companies doing drug exploration. These customers often have separate data lakes for clinical research, for operational data, for safety data, and for regulatory data, and are loathe to move or copy data between them.
“You can actually enable them to access data across these data stores, or these distributed data clouds or data lakes or data warehouse,” Indupuri said. “So that’s where data fabric can help.”
August 8, 2022
August 5, 2022
- Datajoin Raises $3.5M Seed to Fix Marketers’ Broken Tech Stack
- Talend Announces Support for Amazon Redshift Serverless
August 4, 2022
- Ahana Awarded Industry Recognition for Big Data Analytics and Presto Innovations
- NeuroBlade Wins Innovation Award at Flash Memory Summit 2022
- Forrester Recognizes Zenoss in New AIOps Report
- Next Pathway Partners with Microsoft to Migrate Customers to Azure
- Census Achieves Premier Partner Status with Snowflake
- SingleStore Selects AppDirect to Power Real-Time Marketplace
- Zilliz Announces Key Contributions to Milvus 2.1
August 3, 2022
- VisualCortex and i-PRO Collaborate on Enterprise-Wide Computer Vision Tech Deployments
- ManageEngine Releases SaaS Version of Analytics Plus
- Pliops Collaborates with Partners to Break Through Data Scalability Barriers at FMS
- mParticle Acquires AI Startup Vidora
- Tricentis Survey Reveals Majority of Organizations Recognize Value in AI-Augmented DevOps
- Gigamon: Deep Observability Forecasted to Grow From $278M to $2B by 2026
- Pavilion Announces Flash Array 7X Database Performance Gains
- Micro Focus’ CyberRes Partners with Google Cloud
- SentinelOne and Cribl Partner to Deliver Data Flexibility Across Cybersecurity and Observability
- Seagate Addresses Hyperscale Workloads with New Enterprise-Class Nytro SSDs
Most Read Features
- How Intuit Is Building AI, Analytics, and Streaming on One Lakehouse
- A Dozen Questions for Databricks CTO Matei Zaharia
- The Race to Ensure Post Quantum Data Security
- Databricks Claims 30x Advantage in the Lakehouse, But Does It Hold Water?
- Altair Gives Legacy SAS Code a New Place to Run
- Esri Puts Federal Climate Change Action on the Map
- Esri Melds GIS with AI, Graph, and Analytics
- Big Data File Formats Demystified
- Data Mesh Vs. Data Fabric: Understanding the Differences
- StarRocks Brings Speedy OLAP Database to the Cloud
- More Features…
Most Read News In Brief
- Meta Releases AI Model That Translates Over 200 Languages
- Mathematica Helps Crack Zodiac Killer’s Code
- IBM Research Open-Sources Deep Search Tools
- FeatureByte Raises $5.7M to Fix the Weakest Link in AI
- SingleStore is the Newest Data Unicorn with $116M Funding Round
- Google Debuts LaMDA 2 Conversational AI System and AI Test Kitchen
- TIBCO’s ModelOps Takes AI Models Out of the Lab and Onto on the Road
- PowerSchool Launches K-12 Education Cloud Platform with Snowflake
- Altoros Report Compares Two NoSQL Databases
- EMR Serverless Now Available from AWS
- More News In Brief…
Most Read This Just In
- Samsung Launches 24Gbps GDDR6 DRAM
- Fiddler Announces Updates to AI Model Management Platform
- Intel Releases Open Source AI Reference Kits
- SingleStore and Intel Collaborate to Deliver Real-Time Data Technology
- Grafana Labs Introduces K8s Monitoring in Grafana Cloud
- Palantir Named a Leader in AI/ML Platforms by Independent Research Firm
- Dataiku Joins Deloitte US Data and AI Alliance Ecosystem
- Accenture Acquires Latin American Cloud Data Firm Tenbu
- dbt Labs Announces Formal Launch of its Technology Partner Program
- Western Digital Extends HDD Technology Across Smart Video, NAS and Datacenter Channel Segments
- More This Just In…
Sponsored Partner Content
September 13 @ 1:00 pm - September 14 @ 5:00 pmWashington DC United States
September 19 @ 8:00 am - September 20 @ 5:00 pm
October 5 - October 6Santa Clara CA United States
October 10 - October 12Boston MA United States
October 17 - October 20Toronto ON Canada