The Future of Data Management: It’s Already Here
Analysts such as Gartner are claiming that data fabric is the future of data management. But, in fact, the future is already here. We see many signs of market maturity, ranging from total-addressable market projections to vendors pushing ROI. Data fabric’s unique ability to integrate enterprise data and reduce the need for repetitive tasks in data discovery, analysis, and implementation are the reasons why many believe this will be the breakout year for the modern data integration approach.
Gartner defines data fabric as a design concept that serves as an integrated layer, or fabric, of data and connecting processes. A data fabric enables data that is dispersed across various locations and used by different applications to be accessed and analyzed in real-time within a unifying data layer, under the same management and security. And they do it by leveraging both human and machine capabilities.
The data fabric model is continuing to grow into an established technology largely because data is growing exponentially, data sources are becoming more distributed, and many enterprises still haven’t figured out how to get the useful data needed to drive their bottom line. As a result, the businesses that leverage data fabrics will be the ones to succeed.
Dissecting Data Fabrics — More Than a Sum of Their Parts
Some believe that a data fabric is just another term for a metadata management system. To be sure, enterprises should include a metadata-driven design to dynamically support different data delivery styles and to ensure a successful data fabric. But that is only the beginning.
Despite the successful use of data virtualization in data fabrics, it’s wrong to define a data fabric as a system that virtualizes and hides other data sources. Yes, data virtualization creates a data abstraction layer to integrate all data without physically moving it. But data fabrics don’t stop there either. Others consider a data fabric to be a method to access all the file-level data from any machine in their data center. This is true but, again, it is only a piece of the true data fabric.
Using both human and machine capabilities, a data fabric includes all of the above-mentioned components and provides an orchestrated approach for collecting, unifying, and governing data sources throughout the enterprise data management system. In fact, many early adopters built a data fabric to solve a narrower issue or to succeed in a particular use case only to discover other ways its capabilities can be used.
The Convergence of Triggering Factors
During the Covid-19 pandemic, numerous industries adopted digital transformation to survive. These changes increased the demand for accessible data, resulting in increased adoption of the data fabric concept. But the data fabric adoption shift was already well underway. The three-Vs (volume, variety, and velocity of data) are always an issue, compounding other data concerns which data fabrics are well suited to address.
Take security management and fraud detection/prevention for instance. Data fabric can automatically detect data abnormalities and take appropriate steps to correct them, reducing losses and improving regulatory compliance. A data fabric enables organizations to define governance norms and controls, improve risk management, and improve monitoring—something that is increasing in importance given legal standards for data governance and risk management have become more demanding and compliance/governance vital. It also enhances cost savings through the avoidance of potential regulatory penalties.
A data fabric represents a fundamentally different way of connecting data. Those who have adopted one now understand that they can do many things differently, providing an excellent route for enterprises to reconsider a host of issues. Because data fabrics span the entire range of data work, they address the needs of all constituents: developers, business analysts, data scientists, and IT team members collectively. As a result, POCs will continue to grow across departments and divisions.
As the need for data sharing for big data, small data, analytics, business agility, and AI/ML persists, enterprises now realize that it’s helpful to have multi-API access that goes back to the same data fabric.
According to Gartner, Data fabric is becoming increasingly popular because it is a single architecture that can address the levels of diversity, distribution, scale, and complexity of an organization’s data assets. They also state that the approach reduces the time for integration design by 30%, deployment by 30%, and maintenance by 70% because data fabric designs draw upon the ability to use, reuse, and combine different data integration styles.
The same report credits the approach for driving automated data and metadata discovery, data quality, and integration that drive augmented data management. Automating repetitive tasks in most data quality, mastering, and integration solutions are known to lower the overall costs for these solutions by 35-65%, depending on the existing approach.
It also allows organizations to benefit from application resilience, which functions despite system component failure — a difficult task that becomes harder when apps are distributed. Resiliency is gaining in importance, as organizations continue to rapidly implement software across multiple tiers and technology infrastructures. Yet, achieving resilience requires planning at all levels of architecture and continuous revisitation.
The desire to standardize APIs, increase the consistency of access, and create easy ways to import and consume all kinds of data within an organization is becoming paramount. A well-crafted data fabric addresses these objectives and makes applications resilient against changes and errors in data sources.
The Emergence of Undeniable Benchmarking Proof
Organizations are also looking for ways to leverage very large public datasets such as the Wikidata dataset, which is the structured part of Wikipedia and other Wikimedia projects. The largest open RDF dataset, Wikidata, contains 17 billion triples and about 100 million entities which may be why enterprises are increasingly interested in utilizing these public data sources with their own internal data. Available, public data also provides an opportunity for organizations to easily compare benchmarked work of various data fabric enablers, provided vendors/integrators benchmark how fast they create databases and how well queries perform at massive scale. As benchmarks become more publicly available, they will further demonstrate that the technology that underlies and supports data fabrics can produce exceptional results.
Enterprise Knowledge Graphs as an Entry Point
Because data fabric describes an integrated suite of data management technologies, it means it could be built in a variety of ways. However, capabilities such as semantic knowledge graphs, active metadata management, and embedded machine learning (ML) are necessary components to ensure a successful data fabric design.
Enterprise Knowledge Graphs (EKG) enable all three traits, so they are considered an ideal entry point into data fabric creation. In fact, many are adopting EKG, in order to build a single data layer rather than having to rip and replace their existing data warehouses and data lakes.
In the above mentioned report, Gartner posits that “data fabric is the foundation,” as the method improves existing infrastructure, gradually adds automation to overall data management, and combines traditional practices with emerging ones. In the same report, Gartner says to earn success with data fabric, organizations much ensure it supports the combination of different data delivery styles dynamically (through metadata-driven design) to support specific use cases. Operationalize a data fabric by implementing continuous and evolving data engineering practices for the data management ecosystem. And build the data fabric by leveraging existing, well-understood, and established integration technologies and standards, but continue to educate the team on new approaches and practices such as DataOps and data engineering, including in edge environments.”
About the Author: Navin Sharma is vice president of product at Stardog, a leading enterprise knowledge graph (EKG) platform provider. Navin is self-described intrapreneur and a seasoned product management executive who thrives at the intersection of technology innovation and business challenge to create value for both the employer and the customer.