Surveys Find Cloud Surge, Mixed Data
A steady shift to the cloud and continuing momentum for business intelligence workloads highlight an annual industry survey aimed at gauging the maturity of big data technology.
Separately, another big data assessment of Hadoop users released this week found that the data sources and volume required for expanding database projects combine a mix of traditional and emerging sources such as data-gathering sensors.
Self-service analytics platform vendor AtScale said its annual big data “maturity” poll found a “surge” of cloud deployments for projects based on Hadoop, Spark and business intelligence frameworks. More than half of the 1,400 analytics professionals it surveyed said they have deployed big data in the cloud while 72 percent said they plan to soon.
The company said the shift to cloud platforms has surfaced over the last year, indicating that big data projects are maturing and delivering on initial investments. The company cited estimates of “double digit” growth in cloud adoption on an annual basis.
Along with growing cloud adoption also measured in terms of the number of networks nodes used for big data projects, the survey also found that business intelligence has emerged as the top big data workload. Previously, ETL and data science were the leading workloads for big data.
Meanwhile, self-service access to big data grew 15 percent year-on-year despite a general lack of access to self-service tools. Accessibility, data security and governance were among the top concerns compared to last year, with worries about data governance jumping 21 percent since the end of 2015. Governance concerns are driven in part by a data transfer deal between the U.S. and the EU that requires American companies importing personal data from Europe to “commit to robust obligations on how personal data is processed and individual rights are guaranteed.”
The AtScale survey also found that a majority of company polled use the Spark streaming analytics engine for educational purposes while 73 percent of respondents said their Hadoop projects were in production. Tellingly, “Organizations who have deployed Spark in production are 85 percent more likely to achieve value,” the survey concluded.
AtScale, San Mateo, Calif., said its annual survey was conducted in collaboration with Cloudera, Hortonworks (NASDAQ: HDP), MapR, Cognizant, Trifacta and Tableau Software (NYSE: DATA).
Meanwhile, a Hadoop user survey released this week by big data software vendor Syncsort found that legacy systems such as data warehouses, relational database management systems and mainframes remain the largest data sources. However, the study noted that newer sources such as smart devices and sensors generating streaming data are boosting data volumes.
Hence, the software vendor argues that integrating data from all sources, including batch and streaming, is growing in importance. Syncsort’s poll of more than 250 data users also confirms the AtScale finding that data governance is growing in importance for Hadoop users.
“As Hadoop implementations spread across organizations, data governance, and the data quality needed to support it, will become more critical to meet the regulatory and compliance mandates,” noted the survey by Syncsort, Pearl River, N.Y.