December 19, 2017

Data Lakes Crest In Drive to Boost Quality

George Leopold

As more data moves to the cloud, the composition of data lakes is shifting to new sources such as NoSQL databases while cloud data repositories emerge amid hybrid deployments, according to a big data survey.

The year-end survey released this week by “big iron” vendor Syncsort also found that Hadoop and especially Apache Spark continue to make inroads. Earlier enthusiasm for the Spark cluster- computing framework has translated into shift to production workloads. The survey found 70 percent of the organizations polled are either in test or production. Forty percent said they are in production with either Hadoop or Spark, which 30 percent are running proof of concept or pilot programs.

The survey released on Monday (Dec. 18) also underscores growing concerns about data quality and regulatory compliance as companies brace for new European Union data privacy rules to kick in next May. Syncsort reported that 40 percent of survey respondents—mostly in the financial and insurance sectors—said unreliable data is continuing problem, contributing to the steady shift to data lakes as a way to improve data quality.

Meanwhile, compliance with rules such as the EU General Data Protection Regulation is forcing companies to expand the scope of data governance as they place “a higher priority on putting processes in place that allow them to understand what their data is and where it has been,” the survey noted.

As ephemeral streams of data increasingly make their way into more permanent data lakes, 71 percent of those polled by Syncsort identified ETL as the most compelling use case. That result was well ahead of predictive, real-time and other analytics use cases, perhaps illustrating the pressing need for better data preparation tools as data lakes fill up faster with more unstructured sources.

“We are seeing increased adoption of data lake initiatives where organizations are very focused on governance of the data in the data lakes, increasing benefits through advanced analytics and machine learning and deployment of hybrid environments including cloud,” Tendü Yoğurtçu, Syncsort’s CTO, noted in a statement releasing the fourth annual survey findings.

“But those benefits can only be unlocked if organizations have access to enterprise data, can create trusted data sets and establish effective data governance practices,” Yoğurtçu continued. “This propels them to a place where they can not only adapt to digital disruption, but take advantage of it so their businesses thrive.”

As more companies embrace real-time capabilities such as Spark, the survey’s authors assert that customers will shift away from legacy platforms in hopes of harnessing data while reaping savings from investments in new data tools.

Syncsort said it polled nearly 200 respondents, including data architects, IT managers, developers, business intelligence and data analysts as well as data scientists at companies running either Hadoop or Spark. Among the industries represented are financial services and insurance, healthcare, government, telecommunications and retail.

Recent items:

As Data Quality Declines, Costs Soar

GDPR: Say Goodbye to Big Data’s Wild West

How TD Bank Made Its Data Lake More Usable

Applications: Enterprise Analytics

Technologies: Cloud, Frameworks

Sectors: Financial Services, Government, Healthcare, Manufacturing, Other, Retail

Vendors: SyncSort

Tags: apache spark, Data Governance, data lakes, data quality, GDPR, Hadoop

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

May 1, 2024

April 30, 2024

April 29, 2024

April 26, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

CDAO Canada Public Sector 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

Data Lakes Crest In Drive to Boost Quality

Join the discussion Cancel reply